1
|
Cao X, Lu P. DCSGMDA: A dual-channel convolutional model based on stacked deep learning collaborative gradient decomposition for predicting miRNA-disease associations. Comput Biol Chem 2024; 113:108201. [PMID: 39255626 DOI: 10.1016/j.compbiolchem.2024.108201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 08/17/2024] [Accepted: 08/31/2024] [Indexed: 09/12/2024]
Abstract
Numerous studies have shown that microRNAs (miRNAs) play a key role in human diseases as critical biomarkers. Its abnormal expression is often accompanied by the emergence of specific diseases. Therefore, studying the relationship between miRNAs and diseases can deepen the insights of their pathogenesis, grasp the process of disease onset and development, and promote drug research of specific diseases. However, many undiscovered relationships between miRNAs and diseases remain, significantly limiting research on miRNA-disease correlations. To explore more potential correlations, we propose a dual-channel convolutional model based on stacked deep learning collaborative gradient decomposition for predicting miRNA-disease associations (DCSGMDA). Firstly, we constructed similarity networks for miRNAs and diseases, as well as an association relationship network. Secondly, potential features were fully mined using stacked deep learning and gradient decomposition networks, along with dual-channel convolutional neural networks. Finally, correlations were scored by a multilayer perceptron. We performed 5-fold and 10-fold cross-validation experiments on DCSGMDA using two datasets based on the Human MicroRNA Disease Database (HMDD). Additionally, parametric, ablation, and comparative experiments, along with case studies, were conducted. The experimental results demonstrate that DCSGMDA performs well in predicting miRNA-disease associations.
Collapse
Affiliation(s)
- Xu Cao
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, Gansu, China.
| | - Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, Gansu, China.
| |
Collapse
|
2
|
Zhang X, Liu M, Li Z, Zhuo L, Fu X, Zou Q. Fusion of multi-source relationships and topology to infer lncRNA-protein interactions. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102187. [PMID: 38706631 PMCID: PMC11066462 DOI: 10.1016/j.omtn.2024.102187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 05/07/2024]
Abstract
Long non-coding RNAs (lncRNAs) are important factors involved in biological regulatory networks. Accurately predicting lncRNA-protein interactions (LPIs) is vital for clarifying lncRNA's functions and pathogenic mechanisms. Existing deep learning models have yet to yield satisfactory results in LPI prediction. Recently, graph autoencoders (GAEs) have seen rapid development, excelling in tasks like link prediction and node classification. We employed GAE technology for LPI prediction, devising the FMSRT-LPI model based on path masking and degree regression strategies and thereby achieving satisfactory outcomes. This represents the first known integration of path masking and degree regression strategies into the GAE framework for potential LPI inference. The effectiveness of our FMSRT-LPI model primarily relies on four key aspects. First, within the GAE framework, our model integrates multi-source relationships of lncRNAs and proteins with LPN's topological data. Second, the implemented masking strategy efficiently identifies LPN's key paths, reconstructs the network, and reduces the impact of redundant or incorrect data. Third, the integrated degree decoder balances degree and structural information, enhancing node representation. Fourth, the PolyLoss function we introduced is more appropriate for LPI prediction tasks. The results on multiple public datasets further demonstrate our model's potential in LPI prediction.
Collapse
Affiliation(s)
- Xinyu Zhang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Mingzhe Liu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Zhen Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou 510000, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China
| |
Collapse
|
3
|
Shen C, Mao D, Tang J, Liao Z, Chen S. Prediction of LncRNA-Protein Interactions Based on Kernel Combinations and Graph Convolutional Networks. IEEE J Biomed Health Inform 2024; 28:1937-1948. [PMID: 37327093 DOI: 10.1109/jbhi.2023.3286917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
The complexes of long non-coding RNAs bound to proteins can be involved in regulating life activities at various stages of organisms. However, in the face of the growing number of lncRNAs and proteins, verifying LncRNA-Protein Interactions (LPI) based on traditional biological experiments is time-consuming and laborious. Therefore, with the improvement of computing power, predicting LPI has met new development opportunity. In virtue of the state-of-the-art works, a framework called LncRNA-Protein Interactions based on Kernel Combinations and Graph Convolutional Networks (LPI-KCGCN) has been proposed in this article. We first construct kernel matrices by taking advantage of extracting both the lncRNAs and protein concerning the sequence features, sequence similarity features, expression features, and gene ontology. Then reconstruct the existent kernel matrices as the input of the next step. Combined with known LPI interactions, the reconstructed similarity matrices, which can be used as features of the topology map of the LPI network, are exploited in extracting potential representations in the lncRNA and protein space using a two-layer Graph Convolutional Network. The predicted matrix can be finally obtained by training the network to produce scoring matrices w.r.t. lncRNAs and proteins. Different LPI-KCGCN variants are ensemble to derive the final prediction results and testify on balanced and unbalanced datasets. The 5-fold cross-validation shows that the optimal feature information combination on a dataset with 15.5% positive samples has an AUC value of 0.9714 and an AUPR value of 0.9216. On another highly unbalanced dataset with only 5% positive samples, LPI-KCGCN also has outperformed the state-of-the-art works, which achieved an AUC value of 0.9907 and an AUPR value of 0.9267.
Collapse
|
4
|
Tian Z, Han C, Xu L, Teng Z, Song W. MGCNSS: miRNA-disease association prediction with multi-layer graph convolution and distance-based negative sample selection strategy. Brief Bioinform 2024; 25:bbae168. [PMID: 38622356 PMCID: PMC11018511 DOI: 10.1093/bib/bbae168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/14/2024] [Accepted: 03/31/2024] [Indexed: 04/17/2024] Open
Abstract
Identifying disease-associated microRNAs (miRNAs) could help understand the deep mechanism of diseases, which promotes the development of new medicine. Recently, network-based approaches have been widely proposed for inferring the potential associations between miRNAs and diseases. However, these approaches ignore the importance of different relations in meta-paths when learning the embeddings of miRNAs and diseases. Besides, they pay little attention to screening out reliable negative samples which is crucial for improving the prediction accuracy. In this study, we propose a novel approach named MGCNSS with the multi-layer graph convolution and high-quality negative sample selection strategy. Specifically, MGCNSS first constructs a comprehensive heterogeneous network by integrating miRNA and disease similarity networks coupled with their known association relationships. Then, we employ the multi-layer graph convolution to automatically capture the meta-path relations with different lengths in the heterogeneous network and learn the discriminative representations of miRNAs and diseases. After that, MGCNSS establishes a highly reliable negative sample set from the unlabeled sample set with the negative distance-based sample selection strategy. Finally, we train MGCNSS under an unsupervised learning manner and predict the potential associations between miRNAs and diseases. The experimental results fully demonstrate that MGCNSS outperforms all baseline methods on both balanced and imbalanced datasets. More importantly, we conduct case studies on colon neoplasms and esophageal neoplasms, further confirming the ability of MGCNSS to detect potential candidate miRNAs. The source code is publicly available on GitHub https://github.com/15136943622/MGCNSS/tree/master.
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Chenguang Han
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Lewen Xu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Zhixia Teng
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Wei Song
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| |
Collapse
|
5
|
Han GS, Gao Q, Peng LZ, Tang J. Hessian Regularized [Formula: see text]-Nonnegative Matrix Factorization and Deep Learning for miRNA-Disease Associations Prediction. Interdiscip Sci 2024; 16:176-191. [PMID: 38099958 DOI: 10.1007/s12539-023-00594-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 11/05/2023] [Accepted: 11/07/2023] [Indexed: 02/22/2024]
Abstract
Since the identification of microRNAs (miRNAs), empirical research has demonstrated their crucial involvement in the functioning of organisms. Investigating miRNAs significantly bolsters efforts related to averting, diagnosing, and treating intricate human maladies. Yet, exploring every conceivable miRNA-disease association consumes significant resources and time within conventional wet experiments. On the computational front, forecasting potential miRNA-disease connections serves as a valuable source of preliminary insights for medical investigators. As a result, we have developed a novel matrix factorization model known as Hessian-regularized [Formula: see text] nonnegative matrix factorization in combination with deep learning for predicting associations between miRNAs and diseases, denoted as [Formula: see text]-NMF-DF. In particular, we introduce a novel iterative fusion approach to integrate all similarities. This method effectively diminishes the sparsity of the initial miRNA-disease associations matrix. Additionally, we devise a mixed model framework that utilizes deep learning, matrix decomposition, and singular value decomposition to capture and depict the intricate nonlinear features of miRNA and disease. The prediction performance of the six matrix factorization methods is improved by comparison and analysis, similarity matrix fusion, data preprocessing, and parameter adjustment. The AUC and AUPR obtained by the new matrix factorization model under fivefold cross validation are comparative or better with other matrix factorization models. Finally, we select three diseases including lung tumor, bladder tumor and breast tumor for case analysis, and further extend the matrix factorization model based on deep learning. The results show that the hybrid algorithm combining matrix factorization with deep learning proposed in this paper can predict miRNAs related to different diseases with high accuracy.
Collapse
Affiliation(s)
- Guo-Sheng Han
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China.
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China.
| | - Qi Gao
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| | - Ling-Zhi Peng
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| | - Jing Tang
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| |
Collapse
|
6
|
Jin S, Zhang Y, Yu H, Lu M. SADR: Self-Supervised Graph Learning With Adaptive Denoising for Drug Repositioning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:265-277. [PMID: 38190661 DOI: 10.1109/tcbb.2024.3351079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Traditional drug development is often high-risk and time-consuming. A promising alternative is to reuse or relocate approved drugs. Recently, some methods based on graph representation learning have started to be used for drug repositioning. These models learn the low dimensional embeddings of drug and disease nodes from the drug-disease interaction network to predict the potential association between drugs and diseases. However, these methods have strict requirements for the dataset, and if the dataset is sparse, the performance of these methods will be severely affected. At the same time, these methods have poor robustness to noise in the dataset. In response to the above challenges, we propose a drug repositioning model based on self-supervised graph learning with adptive denoising, called SADR. SADR uses data augmentation and contrastive learning strategies to learn feature representations of nodes, which can effectively solve the problems caused by sparse datasets. SADR includes an adaptive denoising training (ADT) component that can effectively identify noisy data during the training process and remove the impact of noise on the model. We have conducted comprehensive experiments on three datasets and have achieved better prediction accuracy compared to multiple baseline models. At the same time, we propose the top 10 new predictive approved drugs for treating two diseases. This demonstrates the ability of our model to identify potential drug candidates for disease indications.
Collapse
|
7
|
Zhang Y, Chu Y, Lin S, Xiong Y, Wei DQ. ReHoGCNES-MDA: prediction of miRNA-disease associations using homogenous graph convolutional networks based on regular graph with random edge sampler. Brief Bioinform 2024; 25:bbae103. [PMID: 38517693 PMCID: PMC10959163 DOI: 10.1093/bib/bbae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/04/2024] [Accepted: 02/23/2024] [Indexed: 03/24/2024] Open
Abstract
Numerous investigations increasingly indicate the significance of microRNA (miRNA) in human diseases. Hence, unearthing associations between miRNA and diseases can contribute to precise diagnosis and efficacious remediation of medical conditions. The detection of miRNA-disease linkages via computational techniques utilizing biological information has emerged as a cost-effective and highly efficient approach. Here, we introduced a computational framework named ReHoGCNES, designed for prospective miRNA-disease association prediction (ReHoGCNES-MDA). This method constructs homogenous graph convolutional network with regular graph structure (ReHoGCN) encompassing disease similarity network, miRNA similarity network and known MDA network and then was tested on four experimental tasks. A random edge sampler strategy was utilized to expedite processes and diminish training complexity. Experimental results demonstrate that the proposed ReHoGCNES-MDA method outperforms both homogenous graph convolutional network and heterogeneous graph convolutional network with non-regular graph structure in all four tasks, which implicitly reveals steadily degree distribution of a graph does play an important role in enhancement of model performance. Besides, ReHoGCNES-MDA is superior to several machine learning algorithms and state-of-the-art methods on the MDA prediction. Furthermore, three case studies were conducted to further demonstrate the predictive ability of ReHoGCNES. Consequently, 93.3% (breast neoplasms), 90% (prostate neoplasms) and 93.3% (prostate neoplasms) of the top 30 forecasted miRNAs were validated by public databases. Hence, ReHoGCNES-MDA might serve as a dependable and beneficial model for predicting possible MDAs.
Collapse
Affiliation(s)
- Yufang Zhang
- School of Mathematical Sciences and SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai 200240, China
- Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan, 473006, China
| | - Yanyi Chu
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Dong-Qing Wei
- Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan, 473006, China
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
8
|
Qumsiyeh E, Salah Z, Yousef M. miRGediNET: A comprehensive examination of common genes in miRNA-Target interactions and disease associations: Insights from a grouping-scoring-modeling approach. Heliyon 2023; 9:e22666. [PMID: 38090011 PMCID: PMC10711121 DOI: 10.1016/j.heliyon.2023.e22666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/15/2023] [Accepted: 11/16/2023] [Indexed: 06/15/2024] Open
Abstract
In the broad and complex field of biological data analysis, researchers frequently gather information from a single source or database. Despite being a widespread practice, this has disadvantages. Relying exclusively on a single source can limit our comprehension as it may omit various perspectives that could be obtained by combining multiple knowledge bases. Acknowledging this shortcoming, we report on miRGediNET, a novel approach combining information from three biological databases. Our investigation focuses on microRNAs (miRNAs), small non-coding RNA molecules that regulate gene expression post-transcriptionally. We delve deeply into the knowledge of these miRNA's interactions with genes and the possible effects these interactions may have on different diseases. The scientific community has long recognized a direct correlation between the progression of specific diseases and miRNAs, as well as the genes they target. By using miRGediNET, we go beyond simply acknowledging this relationship. Rather, we actively look for the critical genes that could act as links between the actions of miRNAs and the mechanisms underlying disease. Our methodology, which carefully identifies and investigates these important genes, is supported by a strategic framework that may open up new possibilities for comprehending diseases and creating treatments. We have developed a tool on the Knime platform as a concrete application of our research. This tool serves as both a validation of our study and an invitation to the larger community to interact with, investigate, and build upon our findings. miRGediNET is publicly accessible on GitHub at https://github.com/malikyousef/miRGediNET, providing a collaborative environment for additional research and innovation for enthusiasts and fellow researchers.
Collapse
Affiliation(s)
- Emma Qumsiyeh
- Department of Computer Science and Information Technology, Al-Quds University, Palestine
| | - Zaidoun Salah
- Molecular Genetics and Genetic Toxicology, Arab American University, Ramallah, Palestine
| | - Malik Yousef
- Information Technology Engineering, Al-Quds University, Abu Dis, Palestine
| |
Collapse
|
9
|
Gao S, Kuang Z, Duan T, Deng L. DEJKMDR: miRNA-disease association prediction method based on graph convolutional network. Front Med (Lausanne) 2023; 10:1234050. [PMID: 37780568 PMCID: PMC10536249 DOI: 10.3389/fmed.2023.1234050] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Accepted: 08/16/2023] [Indexed: 10/03/2023] Open
Abstract
Numerous studies have shown that miRNAs play a crucial role in the investigation of complex human diseases. Identifying the connection between miRNAs and diseases is crucial for advancing the treatment of complex diseases. However, traditional methods are frequently constrained by the small sample size and high cost, so computational simulations are urgently required to rapidly and accurately forecast the potential correlation between miRNA and disease. In this paper, the DEJKMDR, a graph convolutional network (GCN)-based miRNA-disease association prediction model is proposed. The novelty of this model lies in the fact that DEJKMDR integrates biomolecular information on miRNA and illness, including functional miRNA similarity, disease semantic similarity, and miRNA and disease similarity, according to their Gaussian interaction attribute. In order to minimize overfitting, some edges are randomly destroyed during the training phase after DropEdge has been used to regularize the edges. JK-Net, meanwhile, is employed to combine various domain scopes through the adaptive learning of nodes in various placements. The experimental results demonstrate that this strategy has superior accuracy and dependability than previous algorithms in terms of predicting an unknown miRNA-disease relationship. In a 10-fold cross-validation, the average AUC of DEJKMDR is determined to be 0.9772.
Collapse
Affiliation(s)
- Shiyuan Gao
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| | - Zhufang Kuang
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| | - Tao Duan
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
10
|
Hyperbolic matrix factorization improves prediction of drug-target associations. Sci Rep 2023; 13:959. [PMID: 36653463 PMCID: PMC9849222 DOI: 10.1038/s41598-023-27995-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 01/11/2023] [Indexed: 01/19/2023] Open
Abstract
Past research in computational systems biology has focused more on the development and applications of advanced statistical and numerical optimization techniques and much less on understanding the geometry of the biological space. By representing biological entities as points in a low dimensional Euclidean space, state-of-the-art methods for drug-target interaction (DTI) prediction implicitly assume the flat geometry of the biological space. In contrast, recent theoretical studies suggest that biological systems exhibit tree-like topology with a high degree of clustering. As a consequence, embedding a biological system in a flat space leads to distortion of distances between biological objects. Here, we present a novel matrix factorization methodology for drug-target interaction prediction that uses hyperbolic space as the latent biological space. When benchmarked against classical, Euclidean methods, hyperbolic matrix factorization exhibits superior accuracy while lowering embedding dimension by an order of magnitude. We see this as additional evidence that the hyperbolic geometry underpins large biological networks.
Collapse
|
11
|
Yan C, Ding C, Duan G. PMMS: Predicting essential miRNAs based on multi-head self-attention mechanism and sequences. Front Med (Lausanne) 2022; 9:1015278. [DOI: 10.3389/fmed.2022.1015278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 10/25/2022] [Indexed: 11/18/2022] Open
Abstract
Increasing evidence has proved that miRNA plays a significant role in biological progress. In order to understand the etiology and mechanisms of various diseases, it is necessary to identify the essential miRNAs. However, it is time-consuming and expensive to identify essential miRNAs by using traditional biological experiments. It is critical to develop computational methods to predict potential essential miRNAs. In this study, we provided a new computational method (called PMMS) to identify essential miRNAs by using multi-head self-attention and sequences. First, PMMS computes the statistic and structure features and extracts the static feature by concatenating them. Second, PMMS extracts the deep learning original feature (BiLSTM-based feature) by using bi-directional long short-term memory (BiLSTM) and pre-miRNA sequences. In addition, we further obtained the multi-head self-attention feature (MS-based feature) based on BiLSTM-based feature and multi-head self-attention mechanism. By considering the importance of the subsequence of pre-miRNA to the static feature of miRNA, we obtained the deep learning final feature (WA-based feature) based on the weighted attention mechanism. Finally, we concatenated WA-based feature and static feature as an input to the multilayer perceptron) model to predict essential miRNAs. We conducted five-fold cross-validation to evaluate the prediction performance of PMMS. The areas under the ROC curves (AUC), the F1-score, and accuracy (ACC) are used as performance metrics. From the experimental results, PMMS obtained best prediction performances (AUC: 0.9556, F1-score: 0.9030, and ACC: 0.9097). It also outperformed other compared methods. The experimental results also illustrated that PMMS is an effective method to identify essential miRNA.
Collapse
|
12
|
Wei Z, Yao D, Zhan X, Zhang S. A clustering-based sampling method for miRNA-disease association prediction. Front Genet 2022; 13:995535. [PMID: 36176298 PMCID: PMC9513605 DOI: 10.3389/fgene.2022.995535] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
More and more studies have proved that microRNAs (miRNAs) play a critical role in gene expression regulation, and the irregular expression of miRNAs tends to be associated with a variety of complex human diseases. Because of the high cost and low efficiency of identifying disease-associated miRNAs through biological experiments, scholars have focused on predicting potential disease-associated miRNAs by computational methods. Considering that the existing methods are flawed in constructing negative sample set, we proposed a clustering-based sampling method for miRNA-disease association prediction (CSMDA). Firstly, we integrated multiple similarity information of miRNA and disease to represent miRNA-disease pairs. Secondly, we performed a clustering-based sampling method to avoid introducing potential positive samples when constructing negative sample set. Thirdly, we employed a random forest-based feature selection method to reduce noise and redundant information in the high-dimensional feature space. Finally, we implemented an ensemble learning framework for predicting miRNA-disease associations by soft voting. The Precision, Recall, F1-score, AUROC and AUPR of the CSMDA achieved 0.9676, 0.9545, 0.9610, 0.9928, and 0.9940, respectively, under five-fold cross-validation. Besides, case study on three cancers showed that the top 20 potentially associated miRNAs predicted by the CSMDA were confirmed by the dbDEMC database or literatures. The above results demonstrate that the CSMDA can predict potential disease-associated miRNAs more accurately.
Collapse
Affiliation(s)
- Zheng Wei
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
- *Correspondence: Dengju Yao,
| | - Xiaojuan Zhan
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China
| | - Shuli Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| |
Collapse
|
13
|
Shakyawar S, Southekal S, Guda C. mintRULS: Prediction of miRNA–mRNA Target Site Interactions Using Regularized Least Square Method. Genes (Basel) 2022; 13:genes13091528. [PMID: 36140696 PMCID: PMC9498445 DOI: 10.3390/genes13091528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/19/2022] [Accepted: 08/22/2022] [Indexed: 11/16/2022] Open
Abstract
Identification of miRNA–mRNA interactions is critical to understand the new paradigms in gene regulation. Existing methods show suboptimal performance owing to inappropriate feature selection and limited integration of intuitive biological features of both miRNAs and mRNAs. The present regularized least square-based method, mintRULS, employs features of miRNAs and their target sites using pairwise similarity metrics based on free energy, sequence and repeat identities, and target site accessibility to predict miRNA-target site interactions. We hypothesized that miRNAs sharing similar structural and functional features are more likely to target the same mRNA, and conversely, mRNAs with similar features can be targeted by the same miRNA. Our prediction model achieved an impressive AUC of 0.93 and 0.92 in LOOCV and LmiTOCV settings, respectively. In comparison, other popular tools such as miRDB, TargetScan, MBSTAR, RPmirDIP, and STarMir scored AUCs at 0.73, 0.77, 0.55, 0.84, and 0.67, respectively, in LOOCV setting. Similarly, mintRULS outperformed other methods using metrics such as accuracy, sensitivity, specificity, and MCC. Our method also demonstrated high accuracy when validated against experimentally derived data from condition- and cell-specific studies and expression studies of miRNAs and target genes, both in human and mouse.
Collapse
Affiliation(s)
- Sushil Shakyawar
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Siddesh Southekal
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Chittibabu Guda
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, USA
- Center for Biomedical Informatics Research and Innovation (CBIRI), University of Nebraska Medical Center, Omaha, NE 68198, USA
- Correspondence:
| |
Collapse
|
14
|
Zhang W, Hou J, Liu B. iPiDA-LTR: Identifying piwi-interacting RNA-disease associations based on Learning to Rank. PLoS Comput Biol 2022; 18:e1010404. [PMID: 35969645 PMCID: PMC9410559 DOI: 10.1371/journal.pcbi.1010404] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 08/25/2022] [Accepted: 07/18/2022] [Indexed: 12/01/2022] Open
Abstract
Piwi-interacting RNAs (piRNAs) are regarded as drug targets and biomarkers for the diagnosis and therapy of diseases. However, biological experiments cost substantial time and resources, and the existing computational methods only focus on identifying missing associations between known piRNAs and diseases. With the fast development of biological experiments, more and more piRNAs are detected. Therefore, the identification of piRNA-disease associations of newly detected piRNAs has significant theoretical value and practical significance on pathogenesis of diseases. In this study, the iPiDA-LTR predictor is proposed to identify associations between piRNAs and diseases based on Learning to Rank. The iPiDA-LTR predictor not only identifies the missing associations between known piRNAs and diseases, but also detects diseases associated with newly detected piRNAs. Experimental results demonstrate that iPiDA-LTR effectively predicts piRNA-disease associations outperforming the other related methods.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Jialu Hou
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
15
|
Zhang W, Wei H, Liu B. idenMD-NRF: a ranking framework for miRNA-disease association identification. Brief Bioinform 2022; 23:6604995. [PMID: 35679537 DOI: 10.1093/bib/bbac224] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/18/2022] [Accepted: 05/11/2022] [Indexed: 11/12/2022] Open
Abstract
Identifying miRNA-disease associations is an important task for revealing pathogenic mechanism of complicated diseases. Different computational methods have been proposed. Although these methods obtained encouraging performance for detecting missing associations between known miRNAs and diseases, how to accurately predict associated diseases for new miRNAs is still a difficult task. In this regard, a ranking framework named idenMD-NRF is proposed for miRNA-disease association identification. idenMD-NRF treats the miRNA-disease association identification as an information retrieval task. Given a novel query miRNA, idenMD-NRF employs Learning to Rank algorithm to rank associated diseases based on high-level association features and various predictors. The experimental results on two independent test datasets indicate that idenMD-NRF is superior to other compared predictors. A user-friendly web server of idenMD-NRF predictor is freely available at http://bliulab.net/idenMD-NRF/.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Hang Wei
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
16
|
Identification of MiRNA–Disease Associations Based on Information of Multi-Module and Meta-Path. Molecules 2022; 27:molecules27144443. [PMID: 35889314 PMCID: PMC9321348 DOI: 10.3390/molecules27144443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 07/01/2022] [Accepted: 07/08/2022] [Indexed: 12/10/2022] Open
Abstract
Cumulative research reveals that microRNAs (miRNAs) are involved in many critical biological processes including cell proliferation, differentiation and apoptosis. It is of great significance to figure out the associations between miRNAs and human diseases that are the basis for finding biomarkers for diagnosis and targets for treatment. To overcome the time-consuming and labor-intensive problems faced by traditional experiments, a computational method was developed to identify potential associations between miRNAs and diseases based on the graph attention network (GAT) with different meta-path mode and support vector (SVM). Firstly, we constructed a multi-module heterogeneous network based on the meta-path and learned the latent features of different modules by GAT. Secondly, we found the average of the latent features with weight to obtain a final node representation. Finally, we characterized miRNA–disease-association pairs with the node representation and trained an SVM to recognize potential associations. Based on the five-fold cross-validation and benchmark datasets, the proposed method achieved an area under the precision–recall curve (AUPR) of 0.9379 and an area under the receiver–operating characteristic curve (AUC) of 0.9472. The results demonstrate that our method has an outstanding practical application performance and can provide a reference for the discovery of new biomarkers and therapeutic targets.
Collapse
|
17
|
Zhong J, Zhou W, Kang J, Fang Z, Xie M, Xiao Q, Peng W. DNRLCNN: A CNN Framework for Identifying MiRNA-Disease Associations Using Latent Feature Matrix Extraction with Positive Samples. Interdiscip Sci 2022; 14:607-622. [PMID: 35428965 DOI: 10.1007/s12539-022-00509-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 02/24/2022] [Accepted: 03/01/2022] [Indexed: 06/14/2023]
Abstract
Emerging evidence indicates that miRNAs have strong relationships with many human diseases. Investigating the associations will contribute to elucidating the activities of miRNAs and pathogenesis mechanisms, and providing new opportunities for disease diagnosis and drug discovery. Therefore, it is of significance to identify potential associations between miRNAs and diseases. The existing databases about the miRNA-disease associations (MDAs) only provide the known MDAs, which can be regarded as positive samples. However, the unknown MDAs are not sufficient to regard as reliable negative samples. To deal with this uncertainty, we proposed a convolutional neural network (CNN) framework, named DNRLCNN, based on a latent feature matrix extracted by only positive samples to predict MDAs. First, by only considering the positive samples into the calculation process, we captured the latent feature matrix for complex interactions between miRNAs and diseases in low-dimensional space. Then, we constructed a feature vector for each miRNA and disease pair based on the feature representation. Finally, we adopted a modified CNN for the feature vector to predict MDAs. As a result, our model achieves better performance than other state-of-the-art methods which based CNN in fivefold cross-validation on both miRNA-disease association prediction task (average AUC of 0.9030) and miRNA-phenotype association prediction task (average AUC of 0. 9442). In addition, we carried out case studies on two human diseases, and all the top-50 predicted miRNAs for lung neoplasms are confirmed by HMDD v3.2 and dbDEMC 2.0 databases, 98% of the top-50 predicted miRNAs for heart failure are confirmed. The experiment results show that our model has the capability of inferring potential disease-related miRNAs.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Wubin Zhou
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Jiedong Kang
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Zhuo Fang
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China.
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, China.
| |
Collapse
|
18
|
Yan C, Duan G, Li N, Zhang L, Wu FX, Wang J. PDMDA: predicting deep-level miRNA-disease associations with graph neural networks and sequence features. Bioinformatics 2022; 38:2226-2234. [PMID: 35150255 DOI: 10.1093/bioinformatics/btac077] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 01/18/2022] [Accepted: 02/05/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Many studies have shown that microRNAs (miRNAs) play a key role in human diseases. Meanwhile, traditional experimental methods for miRNA-disease association identification are extremely costly, time-consuming and challenging. Therefore, many computational methods have been developed to predict potential associations between miRNAs and diseases. However, those methods mainly predict the existence of miRNA-disease associations, and they cannot predict the deep-level miRNA-disease association types. RESULTS In this study, we propose a new end-to-end deep learning method (called PDMDA) to predict deep-level miRNA-disease associations with graph neural networks (GNNs) and miRNA sequence features. Based on the sequence and structural features of miRNAs, PDMDA extracts the miRNA feature representations by a fully connected network (FCN). The disease feature representations are extracted from the disease-gene network and gene-gene interaction network by GNN model. Finally, a multilayer with three fully connected layers and a softmax layer is designed to predict the final miRNA-disease association scores based on the concatenated feature representations of miRNAs and diseases. Note that PDMDA does not take the miRNA-disease association matrix as input to compute the Gaussian interaction profile similarity. We conduct three experiments based on six association type samples (including circulations, epigenetics, target, genetics, known association of which their types are unknown and unknown association samples). We conduct fivefold cross-validation validation to assess the prediction performance of PDMDA. The area under the receiver operating characteristic curve scores is used as metric. The experiment results show that PDMDA can accurately predict the deep-level miRNA-disease associations. AVAILABILITY AND IMPLEMENTATION Data and source codes are available at https://github.com/27167199/PDMDA.
Collapse
Affiliation(s)
- Cheng Yan
- School of Information Science and Engineering, Hunan University of Chinese Medicine, Changsha 410208, China.,School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Guihua Duan
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Na Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Lishen Zhang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon SK S7N5A9, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| |
Collapse
|
19
|
Gao Z, Wang YT, Wu QW, Li L, Ni JC, Zheng CH. A New Method Based on Matrix Completion and Non-Negative Matrix Factorization for Predicting Disease-Associated miRNAs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:763-772. [PMID: 32991287 DOI: 10.1109/tcbb.2020.3027444] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Numerous studies have shown that microRNAs are associated with the occurrence and development of human diseases. Thus, studying disease-associated miRNAs is significantly valuable to the prevention, diagnosis and treatment of diseases. In this paper, we proposed a novel method based on matrix completion and non-negative matrix factorization (MCNMF)for predicting disease-associated miRNAs. Due to the information inadequacy on miRNA similarities and disease similarities, we calculated the latter via two models, and introduced the Gaussian interaction profile kernel similarity. In addition, the matrix completion (MC)was employed to further replenish the miRNA and disease similarities to improve the prediction performance. And to reduce the sparsity of miRNA-disease association matrix, the method of weighted K nearest neighbor (WKNKN)was used, which is a pre-processing step. We also utilized non-negative matrix factorization (NMF)using dual L2,1-norm, graph Laplacian regularization, and Tikhonov regularization to effectively avoid the overfitting during the prediction. Finally, several experiments and a case study were implemented to evaluate the effectiveness and performance of the proposed MCNMF model. The results indicated that our method could reliably and effectively predict disease-associated miRNAs.
Collapse
|
20
|
Convalescing the Process of Ranking Metabolites for Diseases using Subcellular Localization. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-021-06023-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
21
|
Yan C, Duan G, Zhang Y, Wu FX, Pan Y, Wang J. Predicting Drug-Drug Interactions Based on Integrated Similarity and Semi-Supervised Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:168-179. [PMID: 32310779 DOI: 10.1109/tcbb.2020.2988018] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A drug-drug interaction (DDI) is defined as an association between two drugs where the pharmacological effects of a drug are influenced by another drug. Positive DDIs can usually improve the therapeutic effects of patients, but negative DDIs cause the major cause of adverse drug reactions and even result in the drug withdrawal from the market and the patient death. Therefore, identifying DDIs has become a key component of the drug development and disease treatment. In this study, we propose a novel method to predict DDIs based on the integrated similarity and semi-supervised learning (DDI-IS-SL). DDI-IS-SL integrates the drug chemical, biological and phenotype data to calculate the feature similarity of drugs with the cosine similarity method. The Gaussian Interaction Profile kernel similarity of drugs is also calculated based on known DDIs. A semi-supervised learning method (the Regularized Least Squares classifier) is used to calculate the interaction possibility scores of drug-drug pairs. In terms of the 5-fold cross validation, 10-fold cross validation and de novo drug validation, DDI-IS-SL can achieve the better prediction performance than other comparative methods. In addition, the average computation time of DDI-IS-SL is shorter than that of other comparative methods. Finally, case studies further demonstrate the performance of DDI-IS-SL in practical applications.
Collapse
|
22
|
Ji C, Wang Y, Ni J, Zheng C, Su Y. Predicting miRNA-Disease Associations Based on Heterogeneous Graph Attention Networks. Front Genet 2021; 12:727744. [PMID: 34512733 PMCID: PMC8424198 DOI: 10.3389/fgene.2021.727744] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 08/02/2021] [Indexed: 11/23/2022] Open
Abstract
In recent years, more and more evidence has shown that microRNAs (miRNAs) play an important role in the regulation of post-transcriptional gene expression, and are closely related to human diseases. Many studies have also revealed that miRNAs can be served as promising biomarkers for the potential diagnosis and treatment of human diseases. The interactions between miRNA and human disease have rarely been demonstrated, and the underlying mechanism of miRNA is not clear. Therefore, computational approaches has attracted the attention of researchers, which can not only save time and money, but also improve the efficiency and accuracy of biological experiments. In this work, we proposed a Heterogeneous Graph Attention Networks (GAT) based method for miRNA-disease associations prediction, named HGATMDA. We constructed a heterogeneous graph for miRNAs and diseases, introduced weighted DeepWalk and GAT methods to extract features of miRNAs and diseases from the graph. Moreover, a fully-connected neural networks is used to predict correlation scores between miRNA-disease pairs. Experimental results under five-fold cross validation (five-fold CV) showed that HGATMDA achieved better prediction performance than other state-of-the-art methods. In addition, we performed three case studies on breast neoplasms, lung neoplasms and kidney neoplasms. The results showed that for the three diseases mentioned above, 50 out of top 50 candidates were confirmed by the validation datasets. Therefore, HGATMDA is suitable as an effective tool to identity potential diseases-related miRNAs.
Collapse
Affiliation(s)
- Cunmei Ji
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Yutian Wang
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Jiancheng Ni
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Chunhou Zheng
- School of Artificial Intelligence, Anhui University, Hefei, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, Hefei, China
| |
Collapse
|
23
|
Dai Q, Chu Y, Li Z, Zhao Y, Mao X, Wang Y, Xiong Y, Wei DQ. MDA-CF: Predicting MiRNA-Disease associations based on a cascade forest model by fusing multi-source information. Comput Biol Med 2021; 136:104706. [PMID: 34371319 DOI: 10.1016/j.compbiomed.2021.104706] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/26/2021] [Accepted: 07/26/2021] [Indexed: 01/17/2023]
Abstract
MicroRNAs (miRNAs) are significant regulators in various biological processes. They may become promising biomarkers or therapeutic targets, which provide a new perspective in diagnosis and treatment of multiple diseases. Since the experimental methods are always costly and resource-consuming, prediction of disease-related miRNAs using computational methods is in great need. In this study, we developed MDA-CF to identify underlying miRNA-disease associations based on a cascade forest model. In this method, multi-source information was integrated to represent miRNAs and diseases comprehensively, and the autoencoder was utilized for dimension reduction to obtain the optimal feature space. The cascade forest model was then employed for miRNA-disease association prediction. As a result, the average AUC of MDA-CF was 0.9464 on HMDD v3.2 in five-fold cross-validation. Compared with previous computational methods, MDA-CF performed better on HMDD v2.0 with an average AUC of 0.9258. Moreover, MDA-CF was implemented to investigate colon neoplasm, breast neoplasm, and gastric neoplasm, and 100%, 86%, 88% of the top 50 potential miRNAs were validated by authoritative databases. In conclusion, MDA-CF appears to be a reliable method to uncover disease-associated miRNAs. The source code of MDA-CF is available at https://github.com/a1622108/MDA-CF.
Collapse
Affiliation(s)
- Qiuying Dai
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhiqi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yusong Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China; Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
24
|
SCMFMDA: Predicting microRNA-disease associations based on similarity constrained matrix factorization. PLoS Comput Biol 2021; 17:e1009165. [PMID: 34252084 PMCID: PMC8345837 DOI: 10.1371/journal.pcbi.1009165] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 08/06/2021] [Accepted: 06/08/2021] [Indexed: 11/21/2022] Open
Abstract
miRNAs belong to small non-coding RNAs that are related to a number of complicated biological processes. Considerable studies have suggested that miRNAs are closely associated with many human diseases. In this study, we proposed a computational model based on Similarity Constrained Matrix Factorization for miRNA-Disease Association Prediction (SCMFMDA). In order to effectively combine different disease and miRNA similarity data, we applied similarity network fusion algorithm to obtain integrated disease similarity (composed of disease functional similarity, disease semantic similarity and disease Gaussian interaction profile kernel similarity) and integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity and miRNA Gaussian interaction profile kernel similarity). In addition, the L2 regularization terms and similarity constraint terms were added to traditional Nonnegative Matrix Factorization algorithm to predict disease-related miRNAs. SCMFMDA achieved AUCs of 0.9675 and 0.9447 based on global Leave-one-out cross validation and five-fold cross validation, respectively. Furthermore, the case studies on two common human diseases were also implemented to demonstrate the prediction accuracy of SCMFMDA. The out of top 50 predicted miRNAs confirmed by experimental reports that indicated SCMFMDA was effective for prediction of relationship between miRNAs and diseases. Considerable studies have suggested that miRNAs are closely associated with many human diseases, so predicting potential associations between miRNAs and diseases can contribute to the diagnose and treatment of diseases. Several models of discovering unknown miRNA-diseases associations make the prediction more productive and effective. We proposed SCMFMDA to obtain more accuracy prediction result by applying similarity network fusion to fuse multi-source disease and miRNA information and utilizing similarity constrained matrix factorization to make prediction based on biological information. The global Leave-one-out cross validation and five-fold cross validation were applied to evaluate our model. Consequently, SCMFMDA could achieve AUCs of 0.9675 and 0.9447 that were obviously higher than previous computational models. Furthermore, we implemented case studies on significant human diseases including colon neoplasms and lung neoplasms, 47 and 46 of top-50 were confirmed by experimental reports. All results proved that SCMFMDA could be regard as an effective way to discover unverified connections of miRNA-disease.
Collapse
|
25
|
Zhu K, Ge J, He Y, Li P, Jiang X, Wang J, Mo Y, Huang W, Gong Z, Zeng Z, Xiong W, Yu J. Bioinformatics Analysis of the Signaling Pathways and Genes of Gossypol Induce Death of Nasopharyngeal Carcinoma Cells. DNA Cell Biol 2021; 40:1052-1063. [PMID: 34191589 DOI: 10.1089/dna.2020.6348] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Gossypol has been reported to exhibit antitumor effects against several human cancers. However, the anticancer effects of gossypol on nasopharyngeal carcinoma (NPC) have not been investigated. Against this backdrop, the present study was designed to evaluate the anticancer effects of gossypol against NPC cells and to identify the signaling pathways involved through bioinformatic analysis. Gossypol-inhibited death of NPC cells is concentration-dependent. To explore the underlying mechanism for gossypol's antitumor effect, microarray of gossypol-treated and -untreated NPC cells was performed. A total of 836 differentially expressing genes (DEGs) were identified in gossypol-treated NPC cells, of which 461 genes were upregulated and 375 genes were downregulated. The cellular components, molecular functions, biological processes, and signal pathways, in which the DEGs were involved, were identified by gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). The Gene Set Enrichment Analysis (GSEA) predicted upstream transcription factors (TF) ETS2 and E2F1 that regulate DEGs. Weighted Gene Co-expression Network Analysis (WGCNA) was performed to identify a class of modules and genes related to DNA repair and cell cycle. TNFRSF10B, a receptor for death in NPC cells, was knocked down. The results suggested that the ability of NPC cells to resist gossypol killing was enhanced. In addition, to further investigate the possible molecular mechanisms, we constructed a transcriptional regulatory network of TNFRSF10B containing 109 miRNAs and 47 TFs. Taken together, our results demonstrated that gossypol triggered antitumor effects against NPC cells, indicating its applicability for the management of NPC.
Collapse
Affiliation(s)
- Kunjie Zhu
- Department of Head and Neck Surgery, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,NHC Key Laboratory of Carcinogenesis, and Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Junshang Ge
- Department of Head and Neck Surgery, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,NHC Key Laboratory of Carcinogenesis, and Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Yi He
- NHC Key Laboratory of Carcinogenesis, and Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Panchun Li
- Department of Oral and Maxillofacial Surgery, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Xianjie Jiang
- Department of Head and Neck Surgery, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,NHC Key Laboratory of Carcinogenesis, and Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Jie Wang
- Department of Head and Neck Surgery, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,NHC Key Laboratory of Carcinogenesis, and Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Yongzhen Mo
- Department of Head and Neck Surgery, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,NHC Key Laboratory of Carcinogenesis, and Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Weilun Huang
- Department of Head and Neck Surgery, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China
| | - Zhaojian Gong
- Department of Oral and Maxillofacial Surgery, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Zhaoyang Zeng
- Department of Head and Neck Surgery, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,NHC Key Laboratory of Carcinogenesis, and Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Wei Xiong
- Department of Head and Neck Surgery, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,NHC Key Laboratory of Carcinogenesis, and Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Jianjun Yu
- Department of Head and Neck Surgery, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China
| |
Collapse
|
26
|
Peng X, Li Y, Kong X, Zhu X, Ding X. Investigating Different DNA Methylation Patterns at the Resolution of Methylation Haplotypes. Front Genet 2021; 12:697279. [PMID: 34262601 PMCID: PMC8273290 DOI: 10.3389/fgene.2021.697279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 06/01/2021] [Indexed: 11/15/2022] Open
Abstract
Different DNA methylation patterns presented on different tissues or cell types are considered as one of the main reasons accounting for the tissue-specific gene expressions. In recent years, many methods have been proposed to identify differentially methylated regions (DMRs) based on the mixture of methylation signals from homologous chromosomes. To investigate the possible influence of homologous chromosomes on methylation analysis, this paper proposed a method (MHap) to construct methylation haplotypes for homologous chromosomes in CpG dense regions. Through comparing the methylation consistency between homologous chromosomes in different cell types, it can be found that majority of paired methylation haplotypes derived from homologous chromosomes are consistent, while a lower methylation consistency was observed in the breast cancer sample. It also can be observed that the hypomethylation consistency of differentiated cells is higher than that of the corresponding undifferentiated stem cells. Furthermore, based on the methylation haplotypes constructed on homologous chromosomes, a method (MHap_DMR) is developed to identify DMRs between differentiated cells and the corresponding undifferentiated stem cells, or between the breast cancer sample and the normal breast sample. Through comparing the methylation haplotype modes of DMRs in two cell types, the DNA methylation changing directions of homologous chromosomes in cell differentiation and cancerization can be revealed. The code is available at: https://github.com/xqpeng/MHap_DMR.
Collapse
Affiliation(s)
- Xiaoqing Peng
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Yiming Li
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Xiangyan Kong
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiaoshu Zhu
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Xiaojun Ding
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| |
Collapse
|
27
|
Peng W, Du J, Dai W, Lan W. Predicting miRNA-Disease Association Based on Modularity Preserving Heterogeneous Network Embedding. Front Cell Dev Biol 2021; 9:603758. [PMID: 34178973 PMCID: PMC8223753 DOI: 10.3389/fcell.2021.603758] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 03/23/2021] [Indexed: 12/12/2022] Open
Abstract
MicroRNAs (miRNAs) are a category of small non-coding RNAs that profoundly impact various biological processes related to human disease. Inferring the potential miRNA-disease associations benefits the study of human diseases, such as disease prevention, disease diagnosis, and drug development. In this work, we propose a novel heterogeneous network embedding-based method called MDN-NMTF (Module-based Dynamic Neighborhood Non-negative Matrix Tri-Factorization) for predicting miRNA-disease associations. MDN-NMTF constructs a heterogeneous network of disease similarity network, miRNA similarity network and a known miRNA-disease association network. After that, it learns the latent vector representation for miRNAs and diseases in the heterogeneous network. Finally, the association probability is computed by the product of the latent miRNA and disease vectors. MDN-NMTF not only successfully integrates diverse biological information of miRNAs and diseases to predict miRNA-disease associations, but also considers the module properties of miRNAs and diseases in the course of learning vector representation, which can maximally preserve the heterogeneous network structural information and the network properties. At the same time, we also extend MDN-NMTF to a new version (called MDN-NMTF2) by using modular information to improve the miRNA-disease association prediction ability. Our methods and the other four existing methods are applied to predict miRNA-disease associations in four databases. The prediction results show that our methods can improve the miRNA-disease association prediction to a high level compared with the four existing methods.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.,Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming, China
| | - Jielin Du
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.,Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming, China
| | - Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, China
| |
Collapse
|
28
|
Chu Y, Wang X, Dai Q, Wang Y, Wang Q, Peng S, Wei X, Qiu J, Salahub DR, Xiong Y, Wei DQ. MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph. Brief Bioinform 2021; 22:6261915. [PMID: 34009265 DOI: 10.1093/bib/bbab165] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/02/2021] [Accepted: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Accurate identification of the miRNA-disease associations (MDAs) helps to understand the etiology and mechanisms of various diseases. However, the experimental methods are costly and time-consuming. Thus, it is urgent to develop computational methods towards the prediction of MDAs. Based on the graph theory, the MDA prediction is regarded as a node classification task in the present study. To solve this task, we propose a novel method MDA-GCNFTG, which predicts MDAs based on Graph Convolutional Networks (GCNs) via graph sampling through the Feature and Topology Graph to improve the training efficiency and accuracy. This method models both the potential connections of feature space and the structural relationships of MDA data. The nodes of the graphs are represented by the disease semantic similarity, miRNA functional similarity and Gaussian interaction profile kernel similarity. Moreover, we considered six tasks simultaneously on the MDA prediction problem at the first time, which ensure that under both balanced and unbalanced sample distribution, MDA-GCNFTG can predict not only new MDAs but also new diseases without known related miRNAs and new miRNAs without known related diseases. The results of 5-fold cross-validation show that the MDA-GCNFTG method has achieved satisfactory performance on all six tasks and is significantly superior to the classic machine learning methods and the state-of-the-art MDA prediction methods. Moreover, the effectiveness of GCNs via the graph sampling strategy and the feature and topology graph in MDA-GCNFTG has also been demonstrated. More importantly, case studies for two diseases and three miRNAs are conducted and achieved satisfactory performance.
Collapse
Affiliation(s)
- Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Xuhong Wang
- School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, China
| | - Qiuying Dai
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Yanjing Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Qiankun Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, China
| | | | | | - Dennis Russell Salahub
- Department of Chemistry, University of Calgary, Fellow Royal Society of Canada and Fellow of the American Association for the Advancement of Science, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| |
Collapse
|
29
|
Xu Y, Li HD, Pan Y, Luo F, Wu FX, Wang J. A Gene Rank Based Approach for Single Cell Similarity Assessment and Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:431-442. [PMID: 31369384 DOI: 10.1109/tcbb.2019.2931582] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) technology provides quantitative gene expression profiles at single-cell resolution. As a result, researchers have established new ways to explore cell population heterogeneity and genetic variability of cells. One of the current research directions for scRNA-seq data is to identify different cell types accurately through unsupervised clustering methods. However, scRNA-seq data analysis is challenging because of their high noise level, high dimensionality and sparsity. Moreover, the impact of multiple latent factors on gene expression heterogeneity and on the ability to accurately identify cell types remains unclear. How to overcome these challenges to reveal the biological difference between cell types has become the key to analyze scRNA-seq data. For these reasons, the unsupervised learning for cell population discovery based on scRNA-seq data analysis has become an important research area. A cell similarity assessment method plays a significant role in cell clustering. Here, we present BioRank, a new cell similarity assessment method based on annotated gene sets and gene ranks. To evaluate the performances, we cluster cells by two classical clustering algorithms based on the similarity between cells obtained by BioRank. In addition, BioRank can be used by any clustering algorithm that requires a similarity matrix. Applying BioRank to 12 public scRNA-seq datasets, we show that it is better than or at least as well as several popular similarity assessment methods for single cell clustering.
Collapse
|
30
|
Yan C, Duan G, Wu FX, Pan Y, Wang J. MCHMDA:Predicting Microbe-Disease Associations Based on Similarities and Low-Rank Matrix Completion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:611-620. [PMID: 31295117 DOI: 10.1109/tcbb.2019.2926716] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
With the development of high-through sequencing technology and microbiology, many studies have evidenced that microbes are associated with human diseases, such as obesity, liver cancer, and so on. Therefore, identifying the association between microbes and diseases has become an important study topic in current bioinformatics. The emergence of microbe-disease association database has provided an unprecedented opportunity to develop computational method for predicting microbe-disease associations. In the study, we propose a low-rank matrix completion method (called MCHMDA) to predict microbe-disease associations by integrating similarities of microbes and diseases and known microbe-disease associations into a heterogeneous network. The microbe similarity is computed from Gaussian Interaction Profile (GIP) kernel similarity based on the known microbe-disease associations. Then, we further improve the microbe similarity by taking into account the inhabiting organs of these microbes in human body. The disease similarity is computed by the average of disease GIP similarity, disease symptom-based similarity, and disease functional similarity. Then, we construct a heterogeneous microbe-disease association network by integrating the microbe similarity network, disease similarity network, and known microbe-disease association network. Finally, a matrix completion method is used to calculate the association scores of unknown microbe-disease pairs by the fast Singular Value Thresholding (SVT) algorithm. Via 5-fold Cross Validation (5CV) and Leave-One-Out Cross Validation (LOOCV), we evaluate the prediction performances of MCHMDA and other state-of-the-art methods which include BRWMDA, NGRHMDA, LRLSHMDA, and KATZHMDA. On benchmark dataset HMDAD, the experimental results show that MCHMDA outperforms other methods in terms of area under the receiver operating characteristic curve (AUC). MCHMDA achieves the AUC values of 0.9251 and 0.9495 in 5CV and LOOCV, respectively, which are the highest values among the competing methods. In addition, we also further indicate the prediction generality of MCHMDA on an expanded microbe-disease associations dataset (HMDAD-SUP). Finally, case studies prove the prediction ability in practical applications.
Collapse
|
31
|
Luo H, Wang J, Yan C, Li M, Wu FX, Pan Y. A Novel Drug Repositioning Approach Based on Collaborative Metric Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:463-471. [PMID: 31283509 DOI: 10.1109/tcbb.2019.2926453] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Computational drug repositioning, which is an efficient approach to find potential indications for drugs, has been used to increase the efficiency of drug development. The drug repositioning problem essentially is a top-K recommendation task that recommends most likely diseases to drugs based on drug and disease related information. Therefore, many recommendation methods can be adopted to drug repositioning. Collaborative metric learning (CML) algorithm can produce distance metrics that capture the important relationships among objects, and has been widely used in recommendation domains. By applying CML in drug repositioning, a joint metric space is learned to encode drug's relationships with different diseases. In this study, we propose a novel drug repositioning computational method using Collaborative Metric Learning to predict novel drug-disease associations based on known drug and disease related information. Specifically, the proposed method learns latent vectors of drugs and diseases by applying metric learning, and then predicts the association probability of one drug-disease pair based on the learned vectors. The comprehensive experimental results show that CMLDR outperforms the other state-of-the-art drug repositioning algorithms in terms of precision, recall, and AUPR.
Collapse
|
32
|
Wang MN, You ZH, Wang L, Li LP, Zheng K. LDGRNMF: LncRNA-disease associations prediction based on graph regularized non-negative matrix factorization. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.02.062] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
33
|
Lei X, Mudiyanselage TB, Zhang Y, Bian C, Lan W, Yu N, Pan Y. A comprehensive survey on computational methods of non-coding RNA and disease association prediction. Brief Bioinform 2020; 22:6042241. [PMID: 33341893 DOI: 10.1093/bib/bbaa350] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/20/2020] [Accepted: 11/01/2020] [Indexed: 02/06/2023] Open
Abstract
The studies on relationships between non-coding RNAs and diseases are widely carried out in recent years. A large number of experimental methods and technologies of producing biological data have also been developed. However, due to their high labor cost and production time, nowadays, calculation-based methods, especially machine learning and deep learning methods, have received a lot of attention and been used commonly to solve these problems. From a computational point of view, this survey mainly introduces three common non-coding RNAs, i.e. miRNAs, lncRNAs and circRNAs, and the related computational methods for predicting their association with diseases. First, the mainstream databases of above three non-coding RNAs are introduced in detail. Then, we present several methods for RNA similarity and disease similarity calculations. Later, we investigate ncRNA-disease prediction methods in details and classify these methods into five types: network propagating, recommend system, matrix completion, machine learning and deep learning. Furthermore, we provide a summary of the applications of these five types of computational methods in predicting the associations between diseases and miRNAs, lncRNAs and circRNAs, respectively. Finally, the advantages and limitations of various methods are identified, and future researches and challenges are also discussed.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | | | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Chen Bian
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Wei Lan
- School of Computer, Electronics and Information at Guangxi University, Nanning, China
| | - Ning Yu
- Department of Computing Sciences at the College at Brockport, State University of New York, Rochester, NY, USA
| | - Yi Pan
- Computer Science Department at Georgia State University, Atlanta, GA, USA
| |
Collapse
|
34
|
Li Z, Li J, Nie R, You ZH, Bao W. A graph auto-encoder model for miRNA-disease associations prediction. Brief Bioinform 2020; 22:5929824. [PMID: 34293850 DOI: 10.1093/bib/bbaa240] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 08/26/2020] [Accepted: 08/27/2020] [Indexed: 02/06/2023] Open
Abstract
Emerging evidence indicates that the abnormal expression of miRNAs involves in the evolution and progression of various human complex diseases. Identifying disease-related miRNAs as new biomarkers can promote the development of disease pathology and clinical medicine. However, designing biological experiments to validate disease-related miRNAs is usually time-consuming and expensive. Therefore, it is urgent to design effective computational methods for predicting potential miRNA-disease associations. Inspired by the great progress of graph neural networks in link prediction, we propose a novel graph auto-encoder model, named GAEMDA, to identify the potential miRNA-disease associations in an end-to-end manner. More specifically, the GAEMDA model applies a graph neural networks-based encoder, which contains aggregator function and multi-layer perceptron for aggregating nodes' neighborhood information, to generate the low-dimensional embeddings of miRNA and disease nodes and realize the effective fusion of heterogeneous information. Then, the embeddings of miRNA and disease nodes are fed into a bilinear decoder to identify the potential links between miRNA and disease nodes. The experimental results indicate that GAEMDA achieves the average area under the curve of $93.56\pm 0.44\%$ under 5-fold cross-validation. Besides, we further carried out case studies on colon neoplasms, esophageal neoplasms and kidney neoplasms. As a result, 48 of the top 50 predicted miRNAs associated with these diseases are confirmed by the database of differentially expressed miRNAs in human cancers and microRNA deregulation in human disease database, respectively. The satisfactory prediction performance suggests that GAEMDA model could serve as a reliable tool to guide the following researches on the regulatory role of miRNAs. Besides, the source codes are available at https://github.com/chimianbuhetang/GAEMDA.
Collapse
Affiliation(s)
- Zhengwei Li
- Engineering Research Center of Mine Digitalization of Ministry of Education and School of Computer Science and Technology, China University of Mining and Technology
| | - Jiashu Li
- School of Computer Science and Technology, China University of Mining and Technology
| | - Ru Nie
- School of Computer Science and Technology, China University of Mining and Technology
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science
| | - Wenzheng Bao
- School of Information Engineering, Xuzhou University of Technology
| |
Collapse
|
35
|
Yan C, Duan G, Wu FX, Pan Y, Wang J. BRWMDA:Predicting Microbe-Disease Associations Based on Similarities and Bi-Random Walk on Disease and Microbe Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1595-1604. [PMID: 30932846 DOI: 10.1109/tcbb.2019.2907626] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Many current studies have evidenced that microbes play important roles in human diseases. Therefore, discovering the associations between microbes and diseases is beneficial to systematically understanding the mechanisms of diseases, diagnosing, and treating complex diseases. It is well known that finding new potential microbe-disease associations via biological experiments is a time-consuming and expensive process. However, the computation methods can provide an opportunity to effectively predict microbe-disease associations. In recent years, efforts toward predicting microbe-disease associations are not in proportional to the importance of microbes to human diseases. In this study, we develop a method (called BRWMDA) to predict new microbe-disease associations based on similarity and improving bi-random walk on the disease and microbe networks. BRWMDA integrates microbe network, disease network, and known microbe-disease associations into a single network. After calculating the Gaussian Interaction Profile (GIP) kernel similarity of microbes based on known microbe-disease associations, the microbe network is obtained by adjusting the similarity with the logistics function. In addition, the disease network is computed by the similarity network fusion (SNF) method with the symptom-based similarity and the GIP kernel similarity based on known microbe-disease associations. Then, these two networks of microbe and disease are connected by known microbe-disease associations. Based on the assumption that similar microbes are normally associated with similar diseases and vice versa, BRWMDA is employed to predict new potential microbe-disease associations via random walk with different steps on microbe and disease networks, which reasonably uses the similarity of microbe network and disease network. The 5-fold cross validation and Leave One Out Cross Validation (LOOCV) are adopted to assess the prediction performance of our BRWMDA algorithm, as well as other competing methods for comparison. 5-fold cross validation experiments show that BRWMDA obtained the maximum AUC value of 0.9087, which is again superior to other methods of 0.9025(NGRHMDA), 0.8797 (LRLSHMDA), 0.8571 (KATZHMDA), 0.7782 (HGBI), and 0.5629 (NBI). In addition, BRWMDA also outperforms other methods in terms of LOOCV, whose AUC value is 0.9397, which is superior to other methods of 0.9111(NGRHMDA), 0.8909 (LRLSHMDA), 0.8644 (KATZHMDA), 0.7866 (HGBI), and 0.5553 (NBI). Case studies also illustrate that BRWMDA is an effective method to predict microbe-disease associations.
Collapse
|
36
|
Ni P, Wang J, Zhong P, Li Y, Wu FX, Pan Y. Constructing Disease Similarity Networks Based on Disease Module Theory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:906-915. [PMID: 29993782 DOI: 10.1109/tcbb.2018.2817624] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Quantifying the associations between diseases is now playing an important role in modern biology and medicine. Actually discovering associations between diseases could help us gain deeper insights into pathogenic mechanisms of complex diseases, thus could lead to improvements in disease diagnosis, drug repositioning, and drug development. Due to the growing body of high-throughput biological data, a number of methods have been developed for computing similarity between diseases during the past decade. However, these methods rarely consider the interconnections of genes related to each disease in protein-protein interaction network (PPIN). Recently, the disease module theory has been proposed, which states that disease-related genes or proteins tend to interact with each other in the same neighborhood of a PPIN. In this study, we propose a new method called ModuleSim to measure associations between diseases by using disease-gene association data and PPIN data based on disease module theory. The experimental results show that by considering the interactions between disease modules and their modularity, the disease similarity calculated by ModuleSim has a significant correlation with disease classification of Disease Ontology (DO). Furthermore, ModuleSim outperforms other four popular methods which are all using disease-gene association data and PPIN data to measure disease-disease associations. In addition, the disease similarity network constructed by MoudleSim suggests that ModuleSim is capable of finding potential associations between diseases.
Collapse
|
37
|
Jiang H, Wang J, Li M, Lan W, Wu FX, Pan Y. miRTRS: A Recommendation Algorithm for Predicting miRNA Targets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1032-1041. [PMID: 30281478 DOI: 10.1109/tcbb.2018.2873299] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
microRNAs (miRNAs) are small and important non-coding RNAs that regulate gene expression in transcriptional and post-transcriptional level by combining with their targets (genes). Predicting miRNA targets is an important problem in biological research. It is expensive and time-consuming to identify miRNA targets by using biological experiments. Many computational methods have been proposed to predict miRNA targets. In this study, we develop a novel method, named miRTRS, for predicting miRNA targets based on a recommendation algorithm. miRTRS can predict targets for an isolated (new) miRNA with miRNA sequence similarity, as well as isolated (new) targets for a miRNA with gene sequence similarity. Furthermore, when compared to supervised machine learning methods, miRTRS does not need to select negative samples. We use 10-fold cross validation and independent datasets to evaluate the performance of our method. We compared miRTRS with two most recently published methods for miRNA target prediction. The experimental results have shown that our method miRTRS outperforms competing prediction methods in terms of AUC and other evaluation metrics.
Collapse
|
38
|
Wang J, Kuang Z, Ma Z, Han G. GBDTL2E: Predicting lncRNA-EF Associations Using Diffusion and HeteSim Features Based on a Heterogeneous Network. Front Genet 2020; 11:272. [PMID: 32351537 PMCID: PMC7174746 DOI: 10.3389/fgene.2020.00272] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Accepted: 03/06/2020] [Indexed: 12/02/2022] Open
Abstract
Interactions between genetic factors and environmental factors (EFs) play an important role in many diseases. Many diseases result from the interaction between genetics and EFs. The long non-coding RNA (lncRNA) is an important non-coding RNA that regulates life processes. The ability to predict the associations between lncRNAs and EFs is of important practical significance. However, the recent methods for predicting lncRNA-EF associations rarely use the topological information of heterogenous biological networks or simply treat all objects as the same type without considering the different and subtle semantic meanings of various paths in the heterogeneous network. In order to address this issue, a method based on the Gradient Boosting Decision Tree (GBDT) to predict the association between lncRNAs and EFs (GBDTL2E) is proposed in this paper. The innovation of the GBDTL2E integrates the structural information and heterogenous networks, combines the Hetesim features and the diffusion features based on multi-feature fusion, and uses the machine learning algorithm GBDT to predict the association between lncRNAs and EFs based on heterogeneous networks. The experimental results demonstrate that the proposed algorithm achieves a high performance.
Collapse
Affiliation(s)
- Jiaqi Wang
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| | - Zhufang Kuang
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| | - Zhihao Ma
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| | - Genwei Han
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| |
Collapse
|
39
|
Yan C, Wu FX, Wang J, Duan G. PESM: predicting the essentiality of miRNAs based on gradient boosting machines and sequences. BMC Bioinformatics 2020; 21:111. [PMID: 32183740 PMCID: PMC7079416 DOI: 10.1186/s12859-020-3426-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 02/21/2020] [Indexed: 11/16/2022] Open
Abstract
Background MicroRNAs (miRNAs) are a kind of small noncoding RNA molecules that are direct posttranscriptional regulations of mRNA targets. Studies have indicated that miRNAs play key roles in complex diseases by taking part in many biological processes, such as cell growth, cell death and so on. Therefore, in order to improve the effectiveness of disease diagnosis and treatment, it is appealing to develop advanced computational methods for predicting the essentiality of miRNAs. Result In this study, we propose a method (PESM) to predict the miRNA essentiality based on gradient boosting machines and miRNA sequences. First, PESM extracts the sequence and structural features of miRNAs. Then it uses gradient boosting machines to predict the essentiality of miRNAs. We conduct the 5-fold cross-validation to assess the prediction performance of our method. The area under the receiver operating characteristic curve (AUC), F-measure and accuracy (ACC) are used as the metrics to evaluate the prediction performance. We also compare PESM with other three competing methods which include miES, Gaussian Naive Bayes and Support Vector Machine. Conclusion The results of experiments show that PESM achieves the better prediction performance (AUC: 0.9117, F-measure: 0.8572, ACC: 0.8516) than other three computing methods. In addition, the relative importance of all features also further shows that newly added features can be helpful to improve the prediction performance of methods.
Collapse
Affiliation(s)
- Cheng Yan
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.,School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000, China
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China
| | - Guihua Duan
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.
| |
Collapse
|
40
|
Zhong J, Sun Y, Xie M, Peng W, Zhang C, Wu FX, Wang J. Proteoform characterization based on top-down mass spectrometry. Brief Bioinform 2020; 22:1729-1750. [PMID: 32118252 DOI: 10.1093/bib/bbaa015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 01/23/2020] [Indexed: 12/16/2022] Open
Abstract
Proteins are dominant executors of living processes. Compared to genetic variations, changes in the molecular structure and state of a protein (i.e. proteoforms) are more directly related to pathological changes in diseases. Characterizing proteoforms involves identifying and locating primary structure alterations (PSAs) in proteoforms, which is of practical importance for the advancement of the medical profession. With the development of mass spectrometry (MS) technology, the characterization of proteoforms based on top-down MS technology has become possible. This type of method is relatively new and faces many challenges. Since the proteoform identification is the most important process in characterizing proteoforms, we comprehensively review the existing proteoform identification methods in this study. Before identifying proteoforms, the spectra need to be preprocessed, and protein sequence databases can be filtered to speed up the identification. Therefore, we also summarize some popular deconvolution algorithms, various filtering algorithms for improving the proteoform identification performance and various scoring methods for localizing proteoforms. Moreover, commonly used methods were evaluated and compared in this review. We believe our review could help researchers better understand the current state of the development in this field and design new efficient algorithms for the proteoform characterization.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Yusui Sun
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Wei Peng
- Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Chushu Zhang
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at University of Saskatchewan, Saskatoon, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering at Central South University, Changsha, Hunan, China
| |
Collapse
|
41
|
Yan C, Duan G, Wu FX, Wang J. IILLS: predicting virus-receptor interactions based on similarity and semi-supervised learning. BMC Bioinformatics 2019; 20:651. [PMID: 31881820 PMCID: PMC6933616 DOI: 10.1186/s12859-019-3278-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Background Viral infectious diseases are the serious threat for human health. The receptor-binding is the first step for the viral infection of hosts. To more effectively treat human viral infectious diseases, the hidden virus-receptor interactions must be discovered. However, current computational methods for predicting virus-receptor interactions are limited. Result In this study, we propose a new computational method (IILLS) to predict virus-receptor interactions based on Initial Interaction scores method via the neighbors and the Laplacian regularized Least Square algorithm. IILLS integrates the known virus-receptor interactions and amino acid sequences of receptors. The similarity of viruses is calculated by the Gaussian Interaction Profile (GIP) kernel. On the other hand, we also compute the receptor GIP similarity and the receptor sequence similarity. Then the sequence similarity is used as the final similarity of receptors according to the prediction results. The 10-fold cross validation (10CV) and leave one out cross validation (LOOCV) are used to assess the prediction performance of our method. We also compare our method with other three competing methods (BRWH, LapRLS, CMF). Conlusion The experiment results show that IILLS achieves the AUC values of 0.8675 and 0.9061 with the 10-fold cross validation and leave-one-out cross validation (LOOCV), respectively, which illustrates that IILLS is superior to the competing methods. In addition, the case studies also further indicate that the IILLS method is effective for the virus-receptor interaction prediction.
Collapse
Affiliation(s)
- Cheng Yan
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.,School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000, China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China
| |
Collapse
|
42
|
Yan C, Duan G, Pan Y, Wu FX, Wang J. DDIGIP: predicting drug-drug interactions based on Gaussian interaction profile kernels. BMC Bioinformatics 2019; 20:538. [PMID: 31874609 PMCID: PMC6929542 DOI: 10.1186/s12859-019-3093-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Accepted: 09/10/2019] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND A drug-drug interaction (DDI) is defined as a drug effect modified by another drug, which is very common in treating complex diseases such as cancer. Many studies have evidenced that some DDIs could be an increase or a decrease of the drug effect. However, the adverse DDIs maybe result in severe morbidity and even morality of patients, which also cause some drugs to withdraw from the market. As the multi-drug treatment becomes more and more common, identifying the potential DDIs has become the key issue in drug development and disease treatment. However, traditional biological experimental methods, including in vitro and vivo, are very time-consuming and expensive to validate new DDIs. With the development of high-throughput sequencing technology, many pharmaceutical studies and various bioinformatics data provide unprecedented opportunities to study DDIs. RESULT In this study, we propose a method to predict new DDIs, namely DDIGIP, which is based on Gaussian Interaction Profile (GIP) kernel on the drug-drug interaction profiles and the Regularized Least Squares (RLS) classifier. In addition, we also use the k-nearest neighbors (KNN) to calculate the initial relational score in the presence of new drugs via the chemical, biological, phenotypic data of drugs. We compare the prediction performance of DDIGIP with other competing methods via the 5-fold cross validation, 10-cross validation and de novo drug validation. CONLUSION In 5-fold cross validation and 10-cross validation, DDRGIP method achieves the area under the ROC curve (AUC) of 0.9600 and 0.9636 which are better than state-of-the-art method (L1 Classifier ensemble method) of 0.9570 and 0.9599. Furthermore, for new drugs, the AUC value of DDIGIP in de novo drug validation reaches 0.9262 which also outperforms the other state-of-the-art method (Weighted average ensemble method) of 0.9073. Case studies and these results demonstrate that DDRGIP is an effective method to predict DDIs while being beneficial to drug development and disease treatment.
Collapse
Affiliation(s)
- Cheng Yan
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
- School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000 China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA30302 USA
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9 Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
| |
Collapse
|
43
|
Abstract
BACKGROUND A collection of disease-associated data contributes to study the association between diseases. Discovering closely related diseases plays a crucial role in revealing their common pathogenic mechanisms. This might further imply treatment that can be appropriated from one disease to another. During the past decades, a number of approaches for calculating disease similarity have been developed. However, most of them are designed to take advantage of single or few data sources, which results in their low accuracy. METHODS In this paper, we propose a novel method, called MultiSourcDSim, to calculate disease similarity by integrating multiple data sources, namely, gene-disease associations, GO biological process-disease associations and symptom-disease associations. Firstly, we establish three disease similarity networks according to the three disease-related data sources respectively. Secondly, the representation of each node is obtained by integrating the three small disease similarity networks. In the end, the learned representations are applied to calculate the similarity between diseases. RESULTS Our approach shows the best performance compared to the other three popular methods. Besides, the similarity network built by MultiSourcDSim suggests that our method can also uncover the latent relationships between diseases. CONCLUSIONS MultiSourcDSim is an efficient approach to predict similarity between diseases.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, 410075 China
| | - Danyi Ye
- School of Computer Science and Engineering, Central South University, Changsha, 410075 China
| | - Junmin Zhao
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan, 467000 China
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan, 467000 China
| |
Collapse
|
44
|
Lei X, Tie J. Prediction of disease-related metabolites using bi-random walks. PLoS One 2019; 14:e0225380. [PMID: 31730648 PMCID: PMC6857945 DOI: 10.1371/journal.pone.0225380] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Accepted: 11/04/2019] [Indexed: 12/25/2022] Open
Abstract
Metabolites play a significant role in various complex human disease. The exploration of the relationship between metabolites and diseases can help us to better understand the underlying pathogenesis. Several network-based methods have been used to predict the association between metabolite and disease. However, some methods ignored hierarchical differences in disease network and failed to work in the absence of known metabolite-disease associations. This paper presents a bi-random walks based method for disease-related metabolites prediction, called MDBIRW. First of all, we reconstruct the disease similarity network and metabolite functional similarity network by integrating Gaussian Interaction Profile (GIP) kernel similarity of diseases and GIP kernel similarity of metabolites, respectively. Then, the bi-random walks algorithm is executed on the reconstructed disease similarity network and metabolite functional similarity network to predict potential disease-metabolite associations. At last, MDBIRW achieves reliable performance in leave-one-out cross validation (AUC of 0.910) and 5-fold cross validation (AUC of 0.924). The experimental results show that our method outperforms other existing methods for predicting disease-related metabolites.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an China
| | - Jiaojiao Tie
- School of Computer Science, Shaanxi Normal University, Xi’an China
| |
Collapse
|
45
|
Identifying MiRNA-disease association based on integrating miRNA topological similarity and functional similarity. QUANTITATIVE BIOLOGY 2019. [DOI: 10.1007/s40484-019-0176-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
46
|
Li Y, Li J, Bian N. DNILMF-LDA: Prediction of lncRNA-Disease Associations by Dual-Network Integrated Logistic Matrix Factorization and Bayesian Optimization. Genes (Basel) 2019; 10:E608. [PMID: 31409034 PMCID: PMC6722840 DOI: 10.3390/genes10080608] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 07/22/2019] [Accepted: 08/07/2019] [Indexed: 12/15/2022] Open
Abstract
Identifying associations between lncRNAs and diseases can help understand disease-related lncRNAs and facilitate disease diagnosis and treatment. The dual-network integrated logistic matrix factorization (DNILMF) model has been used for drug-target interaction prediction, and good results have been achieved. We firstly applied DNILMF to lncRNA-disease association prediction (DNILMF-LDA). We combined different similarity kernel matrices of lncRNAs and diseases by using nonlinear fusion to extract the most important information in fused matrices. Then, lncRNA-disease association networks and similarity networks were built simultaneously. Finally, the Gaussian process mutual information (GP-MI) algorithm of Bayesian optimization was adopted to optimize the model parameters. The 10-fold cross-validation result showed that the area under receiving operating characteristic (ROC) curve (AUC) value of DNILMF-LDA was 0.9202, and the area under precision-recall (PR) curve (AUPR) was 0.5610. Compared with LRLSLDA, SIMCLDA, BiwalkLDA, and TPGLDA, the AUC value of our method increased by 38.81%, 13.07%, 8.35%, and 6.75%, respectively. The AUPR value of our method increased by 52.66%, 40.05%, 37.01%, and 44.25%. These results indicate that DNILMF-LDA is an effective method for predicting the associations between lncRNAs and diseases.
Collapse
Affiliation(s)
- Yan Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Junyi Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Naizheng Bian
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China.
| |
Collapse
|
47
|
Automatic ICD code assignment of Chinese clinical notes based on multilayer attention BiRNN. J Biomed Inform 2019; 91:103114. [DOI: 10.1016/j.jbi.2019.103114] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
48
|
Shen C, Ding Y, Tang J, Guo F. Multivariate Information Fusion With Fast Kernel Learning to Kernel Ridge Regression in Predicting LncRNA-Protein Interactions. Front Genet 2019; 9:716. [PMID: 30697228 PMCID: PMC6340980 DOI: 10.3389/fgene.2018.00716] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Accepted: 12/21/2018] [Indexed: 12/31/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) constitute a large class of transcribed RNA molecules. They have a characteristic length of more than 200 nucleotides which do not encode proteins. They play an important role in regulating gene expression by interacting with the homologous RNA-binding proteins. Due to the laborious and time-consuming nature of wet experimental methods, more researchers should pay great attention to computational approaches for the prediction of lncRNA-protein interaction (LPI). An in-depth literature review in the state-of-the-art in silico investigations, leads to the conclusion that there is still room for improving the accuracy and velocity. This paper propose a novel method for identifying LPI by employing Kernel Ridge Regression, based on Fast Kernel Learning (LPI-FKLKRR). This approach, uses four distinct similarity measures for lncRNA and protein space, respectively. It is remarkable, that we extract Gene Ontology (GO) with proteins, in order to improve the quality of information in protein space. The process of heterogeneous kernels integration, applies Fast Kernel Learning (FastKL) to deal with weight optimization. The extrapolation model is obtained by gaining the ultimate prediction associations, after using Kernel Ridge Regression (KRR). Experimental outcomes show that the ability of modeling with LPI-FKLKRR has extraordinary performance compared with LPI prediction schemes. On benchmark dataset, it has been observed that the best Area Under Precision Recall Curve (AUPR) of 0.6950 is obtained by our proposed model LPI-FKLKRR, which outperforms the integrated LPLNP (AUPR: 0.4584), RWR (AUPR: 0.2827), CF (AUPR: 0.2357), LPIHN (AUPR: 0.2299), and LPBNI (AUPR: 0.3302). Also, combined with the experimental results of a case study on a novel dataset, it is anticipated that LPI-FKLKRR will be a useful tool for LPI prediction.
Collapse
Affiliation(s)
- Cong Shen
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
49
|
Abstract
BACKGROUND Many evidences have demonstrated that circRNAs (circular RNA) play important roles in controlling gene expression of human, mouse and nematode. More importantly, circRNAs are also involved in many diseases through fine tuning of post-transcriptional gene expression by sequestering the miRNAs which associate with diseases. Therefore, identifying the circRNA-disease associations is very appealing to comprehensively understand the mechanism, treatment and diagnose of diseases, yet challenging. As the complex mechanism between circRNAs and diseases, wet-lab experiments are expensive and time-consuming to discover novel circRNA-disease associations. Therefore, it is of dire need to employ the computational methods to discover novel circRNA-disease associations. RESULT In this study, we develop a method (DWNN-RLS) to predict circRNA-disease associations based on Regularized Least Squares of Kronecker product kernel. The similarity of circRNAs is computed from the Gaussian Interaction Profile(GIP) based on known circRNA-disease associations. In addition, the similarity of diseases is integrated by the mean of GIP similarity and sematic similarity which is computed by the direct acyclic graph (DAG) representation of diseases. The kernels of circRNA-disease pairs are constructed from the Kronecker product of the kernels of circRNAs and diseases. DWNN (decreasing weight k-nearest neighbor) method is adopted to calculate the initial relational score for new circRNAs and diseases. The Kronecker product kernel based regularised least squares approach is used to predict new circRNA-disease associations. We adopt 5-fold cross validation (5CV), 10-fold cross validation (10CV) and leave one out cross validation (LOOCV) to assess the prediction performance of our method, and compare it with other six competing methods (RLS-avg, RLS-Kron, NetLapRLS, KATZ, NBI, WP). CONLUSION The experiment results show that DWNN-RLS reaches the AUC values of 0.8854, 0.9205 and 0.9701 in 5CV, 10CV and LOOCV, respectively, which illustrates that DWNN-RLS is superior to the competing methods RLS-avg, RLS-Kron, NetLapRLS, KATZ, NBI, WP. In addition, case studies also show that DWNN-RLS is an effective method to predict new circRNA-disease associations.
Collapse
Affiliation(s)
- Cheng Yan
- School of Information Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
- School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000 China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9 Canada
| |
Collapse
|