1
|
Zhang D, Yu N, Yang X, De Marinis Y, Liu ZP, Gao R. SRPNet: stroke risk prediction based on two-level feature selection and deep fusion network. Front Physiol 2024; 15:1357123. [PMID: 39588269 PMCID: PMC11586342 DOI: 10.3389/fphys.2024.1357123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 10/23/2024] [Indexed: 11/27/2024] Open
Abstract
Background Stroke is one of the major chronic non-communicable diseases (NCDs) with high morbidity, disability and mortality. The key to preventing stroke lies in controlling risk factors. However, screening risk factors and quantifying stroke risk levels remain challenging. Methods A novel prediction model for stroke risk based on two-level feature selection and deep fusion network (SRPNet) is proposed to solve the problem mentioned above. First, the two-level feature selection method is used to screen comprehensive features related to stroke risk, enabling accurate identification of significant risk factors while eliminating redundant information. Next, the deep fusion network integrating Transformer and fully connected neural network (FCN) is utilized to establish the risk prediction model SRPNet for stroke patients. Results We evaluate the performance of the SRPNet using screening data from the China Stroke Data Center (CSDC), and further validate its effectiveness with census data on stroke collected in affiliated hospital of Jining Medical University. The experimental results demonstrate that the SRPNet model selects features closely related to stroke and achieves superior risk prediction performance over benchmark methods. Conclusions SRPNet can rapidly identify high-quality stroke risk factors, improve the accuracy of stroke prediction, and provide a powerful tool for clinical diagnosis.
Collapse
Affiliation(s)
- Daoliang Zhang
- School of Control Science and Engineering, Shandong University, Jinan, China
| | - Na Yu
- School of Control Science and Engineering, Shandong University, Jinan, China
| | - Xiaodan Yang
- Department of Rehabilitation Medicine, Affiliated Hospital of Jining Medical University, Jining, China
| | - Yang De Marinis
- School of Control Science and Engineering, Shandong University, Jinan, China
- Department of Clinical Sciences, Lund University, Malmö, Sweden
| | - Zhi-Ping Liu
- School of Control Science and Engineering, Shandong University, Jinan, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan, China
| |
Collapse
|
2
|
Lu D, Zhang Q, Zheng C, Li J, Yin Z. DGNMDA: Dual Heterogeneous Graph Neural Network Encoder for miRNA-Disease Association Prediction. Bioengineering (Basel) 2024; 11:1132. [PMID: 39593792 PMCID: PMC11591469 DOI: 10.3390/bioengineering11111132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 11/03/2024] [Accepted: 11/09/2024] [Indexed: 11/28/2024] Open
Abstract
In recent years, numerous studies have highlighted the pivotal importance of miRNAs in personalized healthcare, showcasing broad application prospects. miRNAs hold significant potential in disease diagnosis, prognosis assessment, and therapeutic target discovery, making them an integral part of precision medicine. They are expected to enable precise disease subtyping and risk prediction, thereby advancing the development of precision medicine. GNNs, a class of deep learning architectures tailored for graph data analysis, have greatly facilitated the advancement of miRNA-disease association prediction algorithms. However, current methods often fall short in leveraging network node information, particularly in utilizing global information while neglecting the importance of local information. Effectively harnessing both local and global information remains a pressing challenge. To tackle this challenge, we propose an innovative model named DGNMDA. Initially, we constructed various miRNA and disease similarity networks based on authoritative databases. Subsequently, we creatively design a dual heterogeneous graph neural network encoder capable of efficiently learning feature information between adjacent nodes and similarity information across the entire graph. Additionally, we develop a specialized fine-grained multi-layer feature interaction gating mechanism to integrate outputs from the neural network encoders to identify novel associations connecting miRNAs with diseases. We evaluate our model using 5-fold cross-validation and real-world disease case studies, based on the HMDD V3.2 dataset. Our method demonstrates superior performance compared to existing approaches in various tasks, confirming the effectiveness and potential of DGNMDA as a robust method for predicting miRNA-disease associations.
Collapse
Affiliation(s)
- Daying Lu
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Q.Z.); (C.Z.); (J.L.); (Z.Y.)
| | - Qi Zhang
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Q.Z.); (C.Z.); (J.L.); (Z.Y.)
| | - Chunhou Zheng
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Q.Z.); (C.Z.); (J.L.); (Z.Y.)
- Artificial Intelligence Academy, Anhui University, Hefei 230039, China
| | - Jian Li
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Q.Z.); (C.Z.); (J.L.); (Z.Y.)
| | - Zhe Yin
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Q.Z.); (C.Z.); (J.L.); (Z.Y.)
| |
Collapse
|
3
|
Sun W, Zhang P, Zhang W, Xu J, Huang Y, Li L. Synchronous Mutual Learning Network and Asynchronous Multi-Scale Embedding Network for miRNA-Disease Association Prediction. Interdiscip Sci 2024; 16:532-553. [PMID: 38310628 DOI: 10.1007/s12539-023-00602-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 02/06/2024]
Abstract
MicroRNA (miRNA) serves as a pivotal regulator of numerous cellular processes, and the identification of miRNA-disease associations (MDAs) is crucial for comprehending complex diseases. Recently, graph neural networks (GNN) have made significant advancements in MDA prediction. However, these methods tend to learn one type of node representation from a single heterogeneous network, ignoring the importance of multiple network topologies and node attributes. Here, we propose SMDAP (Sequence hierarchical modeling-based Mirna-Disease Association Prediction framework), a novel GNN-based framework that incorporates multiple network topologies and various node attributes including miRNA seed and full-length sequences to predict potential MDAs. Specifically, SMDAP consists of two types of MDA representation: following a heterogeneous pattern, we construct a transfer learning-like synchronous mutual learning network to learn the first MDA representation in conjunction with the miRNA seed sequence. Meanwhile, following a homogeneous pattern, we design a subgraph-inspired asynchronous multi-scale embedding network to obtain the second MDA representation based on the miRNA full-length sequence. Subsequently, an adaptive fusion approach is designed to combine the two branches such that we can score the MDAs by the downstream classifier and infer novel MDAs. Comprehensive experiments demonstrate that SMDAP integrates the advantages of multiple network topologies and node attributes into two branch representations. Moreover, the area under the receiver operating characteristic curve is 0.9622 on DB1, which is a 5.06% increase from the baselines. The area under the precision-recall curve is 0.9777, which is a 7.33% increase from the baselines. In addition, case studies on three human cancers validated the predictive performance of SMDAP. Overall, SMDAP represents a powerful tool for MDA prediction.
Collapse
Affiliation(s)
- Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ping Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weihan Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | | | - Li Li
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
4
|
Zhang Y, Wang Z, Wei H, Chen M. Exploring potential circRNA biomarkers for cancers based on double-line heterogeneous graph representation learning. BMC Med Inform Decis Mak 2024; 24:159. [PMID: 38844961 PMCID: PMC11157868 DOI: 10.1186/s12911-024-02564-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 06/04/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND Compared with the time-consuming and labor-intensive for biological validation in vitro or in vivo, the computational models can provide high-quality and purposeful candidates in an instant. Existing computational models face limitations in effectively utilizing sparse local structural information for accurate predictions in circRNA-disease associations. This study addresses this challenge with a proposed method, CDA-DGRL (Prediction of CircRNA-Disease Association based on Double-line Graph Representation Learning), which employs a deep learning framework leveraging graph networks and a dual-line representation model integrating graph node features. METHOD CDA-DGRL comprises several key steps: initially, the integration of diverse biological information to compute integrated similarities among circRNAs and diseases, leading to the construction of a heterogeneous network specific to circRNA-disease associations. Subsequently, circRNA and disease node features are derived using sparse autoencoders. Thirdly, a graph convolutional neural network is employed to capture the local graph network structure by inputting the circRNA-disease heterogeneous network alongside node features. Fourthly, the utilization of node2vec facilitates depth-first sampling of the circRNA-disease heterogeneous network to grasp the global graph network structure, addressing issues associated with sparse raw data. Finally, the fusion of local and global graph network structures is inputted into an extra trees classifier to identify potential circRNA-disease associations. RESULTS The results, obtained through a rigorous five-fold cross-validation on the circR2Disease dataset, demonstrate the superiority of CDA-DGRL with an AUC value of 0.9866 and an AUPR value of 0.9897 compared to existing state-of-the-art models. Notably, the hyper-random tree classifier employed in this model outperforms other machine learning classifiers. CONCLUSION Thus, CDA-DGRL stands as a promising methodology for reliably identifying circRNA-disease associations, offering potential avenues to alleviate the necessity for extensive traditional biological experiments. The source code and data for this study are available at https://github.com/zywait/CDA-DGRL .
Collapse
Affiliation(s)
- Yi Zhang
- School of Computer Science and Engineering, Guilin University of Technology, Guilin, 541004, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, 541004, China
| | - ZhenMei Wang
- School of Big Data, Guangxi Vocational and Technical College, Nanning, 530003, China.
| | - Hanyan Wei
- Pharmacy School, Guilin Medical University, Guilin, 541004, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421010, China
| |
Collapse
|
5
|
Lu P, Jiang J. AE-RW: Predicting miRNA-disease associations by using autoencoder and random walk on miRNA-gene-disease heterogeneous network. Comput Biol Chem 2024; 110:108085. [PMID: 38754260 DOI: 10.1016/j.compbiolchem.2024.108085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/04/2024] [Accepted: 04/23/2024] [Indexed: 05/18/2024]
Abstract
Since scientific investigations have demonstrated that aberrant expression of miRNAs brings about the incidence of numerous intricate diseases, precise determination of miRNA-disease relationships greatly contributes to the advancement of human medical progress. To tackle the issue of inefficient conventional experimental approaches, numerous computational methods have been proposed to predict miRNA-disease association with enhanced accuracy. However, constructing miRNA-gene-disease heterogeneous network by incorporating gene information has been relatively under-explored in existing computational techniques. Accordingly, this paper puts forward a technique to predict miRNA-disease association by applying autoencoder and implementing random walk on miRNA-gene-disease heterogeneous network(AE-RW). Firstly, we integrate association information and similarities between miRNAs, genes, and diseases to construct a miRNA-gene-disease heterogeneous network. Subsequently, we consolidate two network feature representations extracted independently via an autoencoder and a random walk procedure. Finally, deep neural network(DNN) are utilized to conduct association prediction. The experimental results demonstrate that the AE-RW model achieved an AUC of 0.9478 through 5-fold CV on the HMDD v3.2 dataset, outperforming the five most advanced existing models. Additionally, case studies were implemented for breast and lung cancer, further validated the superior predictive capabilities of our model.
Collapse
Affiliation(s)
- Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| | - Jicheng Jiang
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| |
Collapse
|
6
|
Sun SL, Zhou BW, Liu SZ, Xiu YH, Bilal A, Long HX. Prediction of miRNAs and diseases association based on sparse autoencoder and MLP. Front Genet 2024; 15:1369811. [PMID: 38873111 PMCID: PMC11169787 DOI: 10.3389/fgene.2024.1369811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 05/07/2024] [Indexed: 06/15/2024] Open
Abstract
Introduction: MicroRNAs (miRNAs) are small and non-coding RNA molecules which have multiple important regulatory roles within cells. With the deepening research on miRNAs, more and more researches show that the abnormal expression of miRNAs is closely related to various diseases. The relationship between miRNAs and diseases is crucial for discovering the pathogenesis of diseases and exploring new treatment methods. Methods: Therefore, we propose a new sparse autoencoder and MLP method (SPALP) to predict the association between miRNAs and diseases. In this study, we adopt advanced deep learning technologies, including sparse autoencoder and multi-layer perceptron (MLP), to improve the accuracy of predicting miRNA-disease associations. Firstly, the SPALP model uses a sparse autoencoder to perform feature learning and extract the initial features of miRNAs and diseases separately, obtaining the latent features of miRNAs and diseases. Then, the latent features combine miRNAs functional similarity data with diseases semantic similarity data to construct comprehensive miRNAs-diseases datasets. Subsequently, the MLP model can predict the unknown association among miRNAs and diseases. Result: To verify the performance of our model, we set up several comparative experiments. The experimental results show that, compared with traditional methods and other deep learning prediction methods, our method has significantly improved the accuracy of predicting miRNAs-disease associations, with 94.61% accuracy and 0.9859 AUC value. Finally, we conducted case study of SPALP model. We predicted the top 30 miRNAs that might be related to Lupus Erythematosus, Ecute Myeloid Leukemia, Cardiovascular, Stroke, Diabetes Mellitus five elderly diseases and validated that 27, 29, 29, 30, and 30 of the top 30 are indeed associated. Discussion: The SPALP approach introduced in this study is adept at forecasting the links between miRNAs and diseases, addressing the complexities of analyzing extensive bioinformatics datasets and enriching the comprehension contribution to disease progression of miRNAs.
Collapse
Affiliation(s)
- Si-Lin Sun
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Bing-Wei Zhou
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Sheng-Zheng Liu
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Yu-Han Xiu
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Anas Bilal
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, China
| | - Hai-Xia Long
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, China
| |
Collapse
|
7
|
Wu J, Zhao X, He Y, Pan B, Lai J, Ji M, Li S, Huang J, Han J. IDMIR: identification of dysregulated miRNAs associated with disease based on a miRNA-miRNA interaction network constructed through gene expression data. Brief Bioinform 2024; 25:bbae258. [PMID: 38801703 PMCID: PMC11129766 DOI: 10.1093/bib/bbae258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/10/2024] [Accepted: 05/15/2024] [Indexed: 05/29/2024] Open
Abstract
Micro ribonucleic acids (miRNAs) play a pivotal role in governing the human transcriptome in various biological phenomena. Hence, the accumulation of miRNA expression dysregulation frequently assumes a noteworthy role in the initiation and progression of complex diseases. However, accurate identification of dysregulated miRNAs still faces challenges at the current stage. Several bioinformatics tools have recently emerged for forecasting the associations between miRNAs and diseases. Nonetheless, the existing reference tools mainly identify the miRNA-disease associations in a general state and fall short of pinpointing dysregulated miRNAs within a specific disease state. Additionally, no studies adequately consider miRNA-miRNA interactions (MMIs) when analyzing the miRNA-disease associations. Here, we introduced a systematic approach, called IDMIR, which enabled the identification of expression dysregulated miRNAs through an MMI network under the gene expression context, where the network's architecture was designed to implicitly connect miRNAs based on their shared biological functions within a particular disease context. The advantage of IDMIR is that it uses gene expression data for the identification of dysregulated miRNAs by analyzing variations in MMIs. We illustrated the excellent predictive power for dysregulated miRNAs of the IDMIR approach through data analysis on breast cancer and bladder urothelial cancer. IDMIR could surpass several existing miRNA-disease association prediction approaches through comparison. We believe the approach complements the deficiencies in predicting miRNA-disease association and may provide new insights and possibilities for diagnosing and treating diseases. The IDMIR approach is now available as a free R package on CRAN (https://CRAN.R-project.org/package=IDMIR).
Collapse
Affiliation(s)
- Jiashuo Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, No. 157 Baojian Road, Nangang District, Harbin, Heilongjiang Province, China
| | - Xilong Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, No. 157 Baojian Road, Nangang District, Harbin, Heilongjiang Province, China
| | - Yalan He
- College of Bioinformatics Science and Technology, Harbin Medical University, No. 157 Baojian Road, Nangang District, Harbin, Heilongjiang Province, China
| | - Bingyue Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, No. 157 Baojian Road, Nangang District, Harbin, Heilongjiang Province, China
| | - Jiyin Lai
- College of Bioinformatics Science and Technology, Harbin Medical University, No. 157 Baojian Road, Nangang District, Harbin, Heilongjiang Province, China
| | - Miao Ji
- College of Bioinformatics Science and Technology, Harbin Medical University, No. 157 Baojian Road, Nangang District, Harbin, Heilongjiang Province, China
| | - Siyuan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, No. 157 Baojian Road, Nangang District, Harbin, Heilongjiang Province, China
| | - Junling Huang
- College of Bioinformatics Science and Technology, Harbin Medical University, No. 157 Baojian Road, Nangang District, Harbin, Heilongjiang Province, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, No. 157 Baojian Road, Nangang District, Harbin, Heilongjiang Province, China
| |
Collapse
|
8
|
Bing P, Liu W, Zhai Z, Li J, Guo Z, Xiang Y, He B, Zhu L. A novel approach for denoising electrocardiogram signals to detect cardiovascular diseases using an efficient hybrid scheme. Front Cardiovasc Med 2024; 11:1277123. [PMID: 38699582 PMCID: PMC11064874 DOI: 10.3389/fcvm.2024.1277123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 03/25/2024] [Indexed: 05/05/2024] Open
Abstract
Background Electrocardiogram (ECG) signals are inevitably contaminated with various kinds of noises during acquisition and transmission. The presence of noises may produce the inappropriate information on cardiac health, thereby preventing specialists from making correct analysis. Methods In this paper, an efficient strategy is proposed to denoise ECG signals, which employs a time-frequency framework based on S-transform (ST) and combines bi-dimensional empirical mode decomposition (BEMD) and non-local means (NLM). In the method, the ST maps an ECG signal into a subspace in the time frequency domain, then the BEMD decomposes the ST-based time-frequency representation (TFR) into a series of sub-TFRs at different scales, finally the NLM removes noise and restores ECG signal characteristics based on structural self-similarity. Results The proposed method is validated using numerous ECG signals from the MIT-BIH arrhythmia database, and several different types of noises with varying signal-to-noise (SNR) are taken into account. The experimental results show that the proposed technique is superior to the existing wavelet based approach and NLM filtering, with the higher SNR and structure similarity index measure (SSIM), the lower root mean squared error (RMSE) and percent root mean square difference (PRD). Conclusions The proposed method not only significantly suppresses the noise presented in ECG signals, but also preserves the characteristics of ECG signals better, thus, it is more suitable for ECG signals processing.
Collapse
Affiliation(s)
- Pingping Bing
- Hunan Provincial Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha, China
| | - Wei Liu
- College of Mechanical and Electrical Engineering, Beijing University of Chemical Technology, Beijing, China
| | - Zhixing Zhai
- College of Mechanical and Electrical Engineering, Beijing University of Chemical Technology, Beijing, China
| | - Jianghao Li
- Hunan Provincial Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha, China
| | - Zhiqun Guo
- Hunan Provincial Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha, China
| | - Yanrui Xiang
- Hunan Provincial Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha, China
| | - Binsheng He
- Hunan Provincial Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha, China
| | - Lemei Zhu
- Hunan Provincial Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha, China
| |
Collapse
|
9
|
Liu Y, Zhang R, Dong X, Yang H, Li J, Cao H, Tian J, Zhang Y. DAE-CFR: detecting microRNA-disease associations using deep autoencoder and combined feature representation. BMC Bioinformatics 2024; 25:139. [PMID: 38553698 PMCID: PMC10981315 DOI: 10.1186/s12859-024-05757-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 03/20/2024] [Indexed: 04/01/2024] Open
Abstract
BACKGROUND MicroRNA (miRNA) has been shown to play a key role in the occurrence and progression of diseases, making uncovering miRNA-disease associations vital for disease prevention and therapy. However, traditional laboratory methods for detecting these associations are slow, strenuous, expensive, and uncertain. Although numerous advanced algorithms have emerged, it is still a challenge to develop more effective methods to explore underlying miRNA-disease associations. RESULTS In the study, we designed a novel approach on the basis of deep autoencoder and combined feature representation (DAE-CFR) to predict possible miRNA-disease associations. We began by creating integrated similarity matrices of miRNAs and diseases, performing a logistic function transformation, balancing positive and negative samples with k-means clustering, and constructing training samples. Then, deep autoencoder was used to extract low-dimensional feature from two kinds of feature representations for miRNAs and diseases, namely, original association information-based and similarity information-based. Next, we combined the resulting features for each miRNA-disease pair and used a logistic regression (LR) classifier to infer all unknown miRNA-disease interactions. Under five and tenfold cross-validation (CV) frameworks, DAE-CFR not only outperformed six popular algorithms and nine classifiers, but also demonstrated superior performance on an additional dataset. Furthermore, case studies on three diseases (myocardial infarction, hypertension and stroke) confirmed the validity of DAE-CFR in practice. CONCLUSIONS DAE-CFR achieved outstanding performance in predicting miRNA-disease associations and can provide evidence to inform biological experiments and clinical therapy.
Collapse
Affiliation(s)
- Yanling Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Department of Mathematics, Changzhi Medical College, Changzhi, China
| | - Ruiyan Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Xiaojing Dong
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Hong Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Jing Li
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Hongyan Cao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Jing Tian
- Department of Cardiology, First Hospital of Shanxi Medical University, Taiyuan, China.
| | - Yanbo Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China.
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China.
- School of Health and Service Management, Shanxi University of Chinese Medicine, Jinzhong, China.
| |
Collapse
|
10
|
Zou H, Ji B, Zhang M, Liu F, Xie X, Peng S. MHGTMDA: Molecular heterogeneous graph transformer based on biological entity graph for miRNA-disease associations prediction. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102139. [PMID: 38384447 PMCID: PMC10879798 DOI: 10.1016/j.omtn.2024.102139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/31/2024] [Indexed: 02/23/2024]
Abstract
MicroRNAs (miRNAs) play a crucial role in the prevention, prognosis, diagnosis, and treatment of complex diseases. Existing computational methods primarily focus on biologically relevant molecules directly associated with miRNA or disease, overlooking the fact that the human body is a highly complex system where miRNA or disease may indirectly correlate with various types of biomolecules. To address this, we propose a novel prediction model named MHGTMDA (miRNA and disease association prediction using heterogeneous graph transformer based on molecular heterogeneous graph). MHGTMDA integrates biological entity relationships of eight biomolecules, constructing a relatively comprehensive heterogeneous biological entity graph. MHGTMDA serves as a powerful molecular heterogeneity map transformer, capturing structural elements and properties of miRNAs and diseases, revealing potential associations. In a 5-fold cross-validation study, MHGTMDA achieved an area under the receiver operating characteristic curve of 0.9569, surpassing state-of-the-art methods by at least 3%. Feature ablation experiments suggest that considering features among multiple biomolecules is more effective in uncovering miRNA-disease correlations. Furthermore, we conducted differential expression analyses on breast cancer and lung cancer, using MHGTMDA to further validate differentially expressed miRNAs. The results demonstrate MHGTMDA's capability to identify novel MDAs.
Collapse
Affiliation(s)
- Haitao Zou
- Guilin University of Technology, College of Information Science and Engineering, Guilin 541006, China
- Hunan University, College of Computer Science and Electronic Engineering, Changsha 410082, China
| | - Boya Ji
- Hunan University, College of Computer Science and Electronic Engineering, Changsha 410082, China
| | - Meng Zhang
- Xiangya Hospital, The Department of Thoracic Surgery, Changsha 410082, China
| | - Fen Liu
- Hunan Provincial People’s Hospital, Institute of Cardiovascular Epidemiology, Changsha 410082, China
| | - Xiaolan Xie
- Guilin University of Technology, College of Information Science and Engineering, Guilin 541006, China
| | - Shaoliang Peng
- Hunan University, College of Computer Science and Electronic Engineering, Changsha 410082, China
| |
Collapse
|
11
|
Jiao CN, Zhou F, Liu BM, Zheng CH, Liu JX, Gao YL. Multi-Kernel Graph Attention Deep Autoencoder for MiRNA-Disease Association Prediction. IEEE J Biomed Health Inform 2024; 28:1110-1121. [PMID: 38055359 DOI: 10.1109/jbhi.2023.3336247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
Accumulating evidence indicates that microRNAs (miRNAs) can control and coordinate various biological processes. Consequently, abnormal expressions of miRNAs have been linked to various complex diseases. Recognizable proof of miRNA-disease associations (MDAs) will contribute to the diagnosis and treatment of human diseases. Nevertheless, traditional experimental verification of MDAs is laborious and limited to small-scale. Therefore, it is necessary to develop reliable and effective computational methods to predict novel MDAs. In this work, a multi-kernel graph attention deep autoencoder (MGADAE) method is proposed to predict potential MDAs. In detail, MGADAE first employs the multiple kernel learning (MKL) algorithm to construct an integrated miRNA similarity and disease similarity, providing more biological information for further feature learning. Second, MGADAE combines the known MDAs, disease similarity, and miRNA similarity into a heterogeneous network, then learns the representations of miRNAs and diseases through graph convolution operation. After that, an attention mechanism is introduced into MGADAE to integrate the representations from multiple graph convolutional network (GCN) layers. Lastly, the integrated representations of miRNAs and diseases are input into the bilinear decoder to obtain the final predicted association scores. Corresponding experiments prove that the proposed method outperforms existing advanced approaches in MDA prediction. Furthermore, case studies related to two human cancers provide further confirmation of the reliability of MGADAE in practice.
Collapse
|
12
|
Yao D, Deng Y, Zhan X, Zhan X. Predicting lncRNA-disease associations using multiple metapaths in hierarchical graph attention networks. BMC Bioinformatics 2024; 25:46. [PMID: 38287236 PMCID: PMC11271052 DOI: 10.1186/s12859-024-05672-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 01/23/2024] [Indexed: 01/31/2024] Open
Abstract
BACKGROUND Many biological studies have shown that lncRNAs regulate the expression of epigenetically related genes. The study of lncRNAs has helped to deepen our understanding of the pathogenesis of complex diseases at the molecular level. Due to the large number of lncRNAs and the complex and time-consuming nature of biological experiments, applying computer techniques to predict potential lncRNA-disease associations is very effective. To explore information between complex network structures, existing methods rely mainly on lncRNA and disease information. Metapaths have been applied to network models as an effective method for exploring information in heterogeneous graphs. However, existing methods are dominated by lncRNAs or disease nodes and tend to ignore the paths provided by intermediate nodes. METHODS We propose a deep learning model based on hierarchical graphical attention networks to predict unknown lncRNA-disease associations using multiple types of metapaths to extract features. We have named this model the MMHGAN. First, the model constructs a lncRNA-disease-miRNA heterogeneous graph based on known associations and two homogeneous graphs of lncRNAs and diseases. Second, for homogeneous graphs, the features of neighboring nodes are aggregated using a multihead attention mechanism. Third, for the heterogeneous graph, metapaths of different intermediate nodes are selected to construct subgraphs, and the importance of different types of metapaths is calculated and aggregated to obtain the final embedded features. Finally, the features are reconstructed using a fully connected layer to obtain the prediction results. RESULTS We used a fivefold cross-validation method and obtained an average AUC value of 96.07% and an average AUPR value of 93.23%. Additionally, ablation experiments demonstrated the role of homogeneous graphs and different intermediate node path weights. In addition, we studied lung cancer, esophageal carcinoma, and breast cancer. Among the 15 lncRNAs associated with these diseases, 15, 12, and 14 lncRNAs were validated by the lncRNA Disease Database and the Lnc2Cancer Database, respectively. CONCLUSION We compared the MMHGAN model with six existing models with better performance, and the case study demonstrated that the model was effective in predicting the correlation between potential lncRNAs and diseases.
Collapse
Affiliation(s)
- Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China.
| | - Yuexiao Deng
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
| | - Xiaojuan Zhan
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, 150050, China
| | - Xiaorong Zhan
- Department of Endocrinology and Metabolism, Hospital of South, University of Science and Technology, Shenzhen, 518055, China
| |
Collapse
|
13
|
Yao D, Li B, Zhan X, Zhan X, Yu L. GCNFORMER: graph convolutional network and transformer for predicting lncRNA-disease associations. BMC Bioinformatics 2024; 25:5. [PMID: 38166659 PMCID: PMC10763317 DOI: 10.1186/s12859-023-05625-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND A growing body of researches indicate that the disrupted expression of long non-coding RNA (lncRNA) is linked to a range of human disorders. Therefore, the effective prediction of lncRNA-disease association (LDA) can not only suggest solutions to diagnose a condition but also save significant time and labor costs. METHOD In this work, we proposed a novel LDA predicting algorithm based on graph convolutional network and transformer, named GCNFORMER. Firstly, we integrated the intraclass similarity and interclass connections between miRNAs, lncRNAs and diseases, and built a graph adjacency matrix. Secondly, to completely obtain the features between various nodes, we employed a graph convolutional network for feature extraction. Finally, to obtain the global dependencies between inputs and outputs, we used a transformer encoder with a multiheaded attention mechanism to forecast lncRNA-disease associations. RESULTS The results of fivefold cross-validation experiment on the public dataset revealed that the AUC and AUPR of GCNFORMER achieved 0.9739 and 0.9812, respectively. We compared GCNFORMER with six advanced LDA prediction models, and the results indicated its superiority over the other six models. Furthermore, GCNFORMER's effectiveness in predicting potential LDAs is underscored by case studies on breast cancer, colon cancer and lung cancer. CONCLUSIONS The combination of graph convolutional network and transformer can effectively improve the performance of LDA prediction model and promote the in-depth development of this research filed.
Collapse
Affiliation(s)
- Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China.
| | - Bailin Li
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
| | - Xiaojuan Zhan
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, 150050, China
| | - Xiaorong Zhan
- Department of Endocrinology and Metabolism, Hospital of South, University of Science and Technology, Shenzhen, 518055, China
| | - Liyang Yu
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
| |
Collapse
|
14
|
Yuan Z, Shu Z, Peng J, Wang W, Hou J, Han L, Zheng G, Wei Y, Zhong J. Prediction of postoperative liver metastasis in pancreatic ductal adenocarcinoma based on multiparametric magnetic resonance radiomics combined with serological markers: a cohort study of machine learning. Abdom Radiol (NY) 2024; 49:117-130. [PMID: 37819438 DOI: 10.1007/s00261-023-04047-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 08/31/2023] [Accepted: 09/03/2023] [Indexed: 10/13/2023]
Abstract
OBJECTIVE To construct and validate a multi-dimensional model based on multiple machine leaning algorithms to predict PCLM using multi-parameter magnetic resonance (MRI) sequences with clinical and imaging parameters. METHODS A total of 148 PDAC retrospectively examined patients were classified as metastatic or non-metastatic based on results at 3 months after surgery. The radiomics features of the primary tumor were extracted from T2WI images, followed by dimension reduction. Then, multiple machine learning methods were used to construct models. Independent predictors were also screened using multifactor logistic regression and a nomogram was constructed in combination with the radiomics model. Area under the receiver operating characteristic curve (AUC) and decision curve analysis (DCA) were used to assess the accuracy and reliability of the nomogram. RESULTS The diagnostic efficacy of the radiomics model in the training and test set was 0.822 and 0.803, sensitivity was 0.742 and 0.692, and specificity was 0.792 and 0.875, respectively. The diagnostic efficacy of the nomogram in the training and test set was 0.866 and 0.832. CONCLUSION A radiomics nomogram based on machine learning improved the accuracy of predicting PCLM and may be useful for early preoperative diagnosis.
Collapse
Affiliation(s)
- Zhongyu Yuan
- Jinzhou Medical University, Jinzhou, Liaoning Province, China
| | - Zhenyu Shu
- Cancer Center, Department of Radiology, Zhejiang Provincial Hospital, Affiliated People's Hospital, Hangzhou Medical College, Hanzhou, Zhejiang, China
| | - Jiaxuan Peng
- Jinzhou Medical University, Jinzhou, Liaoning Province, China
| | - Wei Wang
- Jinzhou Medical University, Jinzhou, Liaoning Province, China
| | - Jie Hou
- Jinzhou Medical University, Jinzhou, Liaoning Province, China
| | - Lu Han
- Jinzhou Medical University, Jinzhou, Liaoning Province, China
| | - Guangying Zheng
- Jinzhou Medical University, Jinzhou, Liaoning Province, China
| | - Yuguo Wei
- Advanced Analytics, Global Medical Service, GE Healthcare, China, Xihu District, Hangzhou, 310000, China
| | - Jianguo Zhong
- Cancer Center, Department of Radiology, Zhejiang Provincial Hospital, Affiliated People's Hospital, Hangzhou Medical College, Hanzhou, Zhejiang, China.
| |
Collapse
|
15
|
Liao Q, Fu X, Zhuo L, Chen H. An efficient model for predicting human diseases through miRNA based on multiple-types of contrastive learning. Front Microbiol 2023; 14:1325001. [PMID: 38163075 PMCID: PMC10755968 DOI: 10.3389/fmicb.2023.1325001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 11/16/2023] [Indexed: 01/03/2024] Open
Abstract
Multiple studies have demonstrated that microRNA (miRNA) can be deeply involved in the regulatory mechanism of human microbiota, thereby inducing disease. Developing effective methods to infer potential associations between microRNAs (miRNAs) and diseases can aid early diagnosis and treatment. Recent methods utilize machine learning or deep learning to predict miRNA-disease associations (MDAs), achieving state-of-the-art performance. However, the problem of sparse neighborhoods of nodes due to lack of data has not been well solved. To this end, we propose a new model named MTCL-MDA, which integrates multiple-types of contrastive learning strategies into a graph collaborative filtering model to predict potential MDAs. The model adopts a contrastive learning strategy based on topology, which alleviates the damage to model performance caused by sparse neighborhoods. In addition, the model also adopts a semantic-based contrastive learning strategy, which not only reduces the impact of noise introduced by topology-based contrastive learning, but also enhances the semantic information of nodes. Experimental results show that our model outperforms existing models on all evaluation metrics. Case analysis shows that our model can more accurately identify potential MDA, which is of great significance for the screening and diagnosis of real-life diseases. Our data and code are publicly available at: https://github.com/Lqingquan/MTCL-MDA.
Collapse
Affiliation(s)
- Qingquan Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Hao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
16
|
Wang S, Li Y, Zhang Y, Pang S, Qiao S, Zhang Y, Wang F. Generative Adversarial Matrix Completion Network based on Multi-Source Data Fusion for miRNA-Disease Associations Prediction. Brief Bioinform 2023; 24:bbad270. [PMID: 37482409 DOI: 10.1093/bib/bbad270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 06/16/2023] [Accepted: 07/04/2023] [Indexed: 07/25/2023] Open
Abstract
Numerous biological studies have shown that considering disease-associated micro RNAs (miRNAs) as potential biomarkers or therapeutic targets offers new avenues for the diagnosis of complex diseases. Computational methods have gradually been introduced to reveal disease-related miRNAs. Considering that previous models have not fused sufficiently diverse similarities, that their inappropriate fusion methods may lead to poor quality of the comprehensive similarity network and that their results are often limited by insufficiently known associations, we propose a computational model called Generative Adversarial Matrix Completion Network based on Multi-source Data Fusion (GAMCNMDF) for miRNA-disease association prediction. We create a diverse network connecting miRNAs and diseases, which is then represented using a matrix. The main task of GAMCNMDF is to complete the matrix and obtain the predicted results. The main innovations of GAMCNMDF are reflected in two aspects: GAMCNMDF integrates diverse data sources and employs a nonlinear fusion approach to update the similarity networks of miRNAs and diseases. Also, some additional information is provided to GAMCNMDF in the form of a 'hint' so that GAMCNMDF can work successfully even when complete data are not available. Compared with other methods, the outcomes of 10-fold cross-validation on two distinct databases validate the superior performance of GAMCNMDF with statistically significant results. It is worth mentioning that we apply GAMCNMDF in the identification of underlying small molecule-related miRNAs, yielding outstanding performance results in this specific domain. In addition, two case studies about two important neoplasms show that GAMCNMDF is a promising prediction method.
Collapse
Affiliation(s)
- ShuDong Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - YunYin Li
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - YuanYuan Zhang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - ShanChen Pang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - SiBo Qiao
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - Yu Zhang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - FuYu Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| |
Collapse
|
17
|
Sheng N, Wang Y, Huang L, Gao L, Cao Y, Xie X, Fu Y. Multi-task prediction-based graph contrastive learning for inferring the relationship among lncRNAs, miRNAs and diseases. Brief Bioinform 2023; 24:bbad276. [PMID: 37529914 DOI: 10.1093/bib/bbad276] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/09/2023] [Accepted: 07/11/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION Identifying the relationships among long non-coding RNAs (lncRNAs), microRNAs (miRNAs) and diseases is highly valuable for diagnosing, preventing, treating and prognosing diseases. The development of effective computational prediction methods can reduce experimental costs. While numerous methods have been proposed, they often to treat the prediction of lncRNA-disease associations (LDAs), miRNA-disease associations (MDAs) and lncRNA-miRNA interactions (LMIs) as separate task. Models capable of predicting all three relationships simultaneously remain relatively scarce. Our aim is to perform multi-task predictions, which not only construct a unified framework, but also facilitate mutual complementarity of information among lncRNAs, miRNAs and diseases. RESULTS In this work, we propose a novel unsupervised embedding method called graph contrastive learning for multi-task prediction (GCLMTP). Our approach aims to predict LDAs, MDAs and LMIs by simultaneously extracting embedding representations of lncRNAs, miRNAs and diseases. To achieve this, we first construct a triple-layer lncRNA-miRNA-disease heterogeneous graph (LMDHG) that integrates the complex relationships between these entities based on their similarities and correlations. Next, we employ an unsupervised embedding model based on graph contrastive learning to extract potential topological feature of lncRNAs, miRNAs and diseases from the LMDHG. The graph contrastive learning leverages graph convolutional network architectures to maximize the mutual information between patch representations and corresponding high-level summaries of the LMDHG. Subsequently, for the three prediction tasks, multiple classifiers are explored to predict LDA, MDA and LMI scores. Comprehensive experiments are conducted on two datasets (from older and newer versions of the database, respectively). The results show that GCLMTP outperforms other state-of-the-art methods for the disease-related lncRNA and miRNA prediction tasks. Additionally, case studies on two datasets further demonstrate the ability of GCLMTP to accurately discover new associations. To ensure reproducibility of this work, we have made the datasets and source code publicly available at https://github.com/sheng-n/GCLMTP.
Collapse
Affiliation(s)
- Nan Sheng
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
- School of Artificial Intelligence, Jilin University, 130012 Changchun, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Ling Gao
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, 130012 Changchun, China
| | - Xuping Xie
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Yuan Fu
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, Ceredigion, UK
| |
Collapse
|
18
|
Zhang W, Liu B. iSnoDi-MDRF: Identifying snoRNA-Disease Associations Based on Multiple Biological Data by Ranking Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3013-3019. [PMID: 37030816 DOI: 10.1109/tcbb.2023.3258448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Accumulating evidence indicates that the dysregulation of small nucleolar RNAs (snoRNAs) is relevant with diseases. Identifying snoRNA-disease associations by computational methods is desired for biologists, which can save considerable costs and time compared biological experiments. However, it still faces some challenges as followings: (i) Many snoRNAs are detected in recent years, but only a few snoRNAs have been proved to be associated with diseases; (ii) Computational predictors trained with only a few known snoRNA-disease associations fail to accurately identify the snoRNA-disease associations. In this study, we propose a ranking framework, called iSnoDi-MDRF, to identify potential snoRNA-disease associations based on multiple biological data, which has the following highlights: (i) iSnoDi-MDRF integrates ranking framework, which is not only able to identify potential associations between known snoRNAs and diseases, but also can identify diseases associated with new snoRNAs. (ii) Known gene-disease associations are employed to help train a mature model for predicting snoRNA-disease association. Experimental results illustrate that iSnoDi-MDRF is very suitable for identifying potential snoRNA-disease associations. The web server of iSnoDi-MDRF predictor is freely available at http://bliulab.net/iSnoDi-MDRF/.
Collapse
|
19
|
Hu X, Yin Z, Zeng Z, Peng Y. Prediction of miRNA-Disease Associations by Cascade Forest Model Based on Stacked Autoencoder. Molecules 2023; 28:5013. [PMID: 37446675 DOI: 10.3390/molecules28135013] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/23/2023] [Accepted: 06/24/2023] [Indexed: 07/15/2023] Open
Abstract
Numerous pieces of evidence have indicated that microRNA (miRNA) plays a crucial role in a series of significant biological processes and is closely related to complex disease. However, the traditional biological experimental methods used to verify disease-related miRNAs are inefficient and expensive. Thus, it is necessary to design some excellent approaches to improve efficiency. In this work, a novel method (CFSAEMDA) is proposed for the prediction of unknown miRNA-disease associations (MDAs). Specifically, we first capture the interactive features of miRNA and disease by integrating multi-source information. Then, the stacked autoencoder is applied for obtaining the underlying feature representation. Finally, the modified cascade forest model is employed to complete the final prediction. The experimental results present that the AUC value obtained by our method is 97.67%. The performance of CFSAEMDA is superior to several of the latest methods. In addition, case studies conducted on lung neoplasms, breast neoplasms and hepatocellular carcinoma further show that the CFSAEMDA method may be regarded as a utility approach to infer unknown disease-miRNA relationships.
Collapse
Affiliation(s)
- Xiang Hu
- Center of Intelligent Computing and Applied Statistics, School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Zhixiang Yin
- Center of Intelligent Computing and Applied Statistics, School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Zhiliang Zeng
- Center of Intelligent Computing and Applied Statistics, School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Yu Peng
- Center of Intelligent Computing and Applied Statistics, School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
| |
Collapse
|
20
|
Chen L, Chen K, Zhou B. Inferring drug-disease associations by a deep analysis on drug and disease networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:14136-14157. [PMID: 37679129 DOI: 10.3934/mbe.2023632] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Drugs, which treat various diseases, are essential for human health. However, developing new drugs is quite laborious, time-consuming, and expensive. Although investments into drug development have greatly increased over the years, the number of drug approvals each year remain quite low. Drug repositioning is deemed an effective means to accelerate the procedures of drug development because it can discover novel effects of existing drugs. Numerous computational methods have been proposed in drug repositioning, some of which were designed as binary classifiers that can predict drug-disease associations (DDAs). The negative sample selection was a common defect of this method. In this study, a novel reliable negative sample selection scheme, named RNSS, is presented, which can screen out reliable pairs of drugs and diseases with low probabilities of being actual DDAs. This scheme considered information from k-neighbors of one drug in a drug network, including their associations to diseases and the drug. Then, a scoring system was set up to evaluate pairs of drugs and diseases. To test the utility of the RNSS, three classic classification algorithms (random forest, bayes network and nearest neighbor algorithm) were employed to build classifiers using negative samples selected by the RNSS. The cross-validation results suggested that such classifiers provided a nearly perfect performance and were significantly superior to those using some traditional and previous negative sample selection schemes.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Kaiyu Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Bo Zhou
- Shanghai University of Medicine & Health Sciences, Shanghai 201318, China
| |
Collapse
|
21
|
Fan C, Ding M. Inferring pseudogene-MiRNA associations based on an ensemble learning framework with similarity kernel fusion. Sci Rep 2023; 13:8833. [PMID: 37258695 DOI: 10.1038/s41598-023-36054-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 05/28/2023] [Indexed: 06/02/2023] Open
Abstract
Accumulating evidence shows that pseudogenes can function as microRNAs (miRNAs) sponges and regulate gene expression. Mining potential interactions between pseudogenes and miRNAs will facilitate the clinical diagnosis and treatment of complex diseases. However, identifying their interactions through biological experiments is time-consuming and labor intensive. In this study, an ensemble learning framework with similarity kernel fusion is proposed to predict pseudogene-miRNA associations, named ELPMA. First, four pseudogene similarity profiles and five miRNA similarity profiles are measured based on the biological and topology properties. Subsequently, similarity kernel fusion method is used to integrate the similarity profiles. Then, the feature representation for pseudogenes and miRNAs is obtained by combining the pseudogene-pseudogene similarities, miRNA-miRNA similarities. Lastly, individual learners are performed on each training subset, and the soft voting is used to yield final decision based on the prediction results of individual learners. The k-fold cross validation is implemented to evaluate the prediction performance of ELPMA method. Besides, case studies are conducted on three investigated pseudogenes to validate the predict performance of ELPMA method for predicting pseudogene-miRNA interactions. Therefore, all experiment results show that ELPMA model is a feasible and effective tool to predict interactions between pseudogenes and miRNAs.
Collapse
Affiliation(s)
- Chunyan Fan
- School of Computer Science and Engineering, Xi'an Technological University, Xi'an, 710021, China.
| | - Mingchao Ding
- School of Computer Science, Hubei University of Technology, Wuhan, 430068, China
| |
Collapse
|
22
|
Gu C, Li X. Prediction of disease-related miRNAs by voting with multiple classifiers. BMC Bioinformatics 2023; 24:177. [PMID: 37122001 PMCID: PMC10150488 DOI: 10.1186/s12859-023-05308-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 04/26/2023] [Indexed: 05/02/2023] Open
Abstract
There is strong evidence to support that mutations and dysregulation of miRNAs are associated with a variety of diseases, including cancer. However, the experimental methods used to identify disease-related miRNAs are expensive and time-consuming. Effective computational approaches to identify disease-related miRNAs are in high demand and would aid in the detection of lncRNA biomarkers for disease diagnosis, treatment, and prevention. In this study, we develop an ensemble learning framework to reveal the potential associations between miRNAs and diseases (ELMDA). The ELMDA framework does not rely on the known associations when calculating miRNA and disease similarities and uses multi-classifiers voting to predict disease-related miRNAs. As a result, the average AUC of the ELMDA framework was 0.9229 for the HMDD v2.0 database in a fivefold cross-validation. All potential associations in the HMDD V2.0 database were predicted, and 90% of the top 50 results were verified with the updated HMDD V3.2 database. The ELMDA framework was implemented to investigate gastric neoplasms, prostate neoplasms and colon neoplasms, and 100%, 94%, and 90%, respectively, of the top 50 potential miRNAs were validated by the HMDD V3.2 database. Moreover, the ELMDA framework can predict isolated disease-related miRNAs. In conclusion, ELMDA appears to be a reliable method to uncover disease-associated miRNAs.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| | - Xiaoying Li
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
23
|
Zou Z, Chen J, Wu W, Luo J, Long T, Wu Q, Wang Q, Zhen J, Zhao Y, Wang Y, Chen Y, Zhou M, Xu L. Detection of peanut seed vigor based on hyperspectral imaging and chemometrics. FRONTIERS IN PLANT SCIENCE 2023; 14:1127108. [PMID: 36923124 PMCID: PMC10010490 DOI: 10.3389/fpls.2023.1127108] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/10/2023] [Indexed: 06/18/2023]
Abstract
Rapid nondestructive testing of peanut seed vigor is of great significance in current research. Before seeds are sown, effective screening of high-quality seeds for planting is crucial to improve the quality of crop yield, and seed vitality is one of the important indicators to evaluate seed quality, which can represent the potential ability of seeds to germinate quickly and whole and grow into normal seedlings or plants. Meanwhile, the advantage of nondestructive testing technology is that the seeds themselves will not be damaged. In this study, hyperspectral technology and superoxide dismutase activity were used to detect peanut seed vigor. To investigate peanut seed vigor and predict superoxide dismutase activity, spectral characteristics of peanut seeds in the wavelength range of 400-1000 nm were analyzed. The spectral data are processed by a variety of hot spot algorithms. Spectral data were preprocessed with Savitzky-Golay (SG), multivariate scatter correction (MSC), and median filtering (MF), which can effectively to reduce the effects of baseline drift and tilt. CatBoost and Gradient Boosted Decision Tree were used for feature band extraction, the top five weights of the characteristic bands of peanut seed vigor classification are 425.48nm, 930.8nm, 965.32nm, 984.0nm, and 994.7nm. XGBoost, LightGBM, Support Vector Machine and Random Forest were used for modeling of seed vitality classification. XGBoost and partial least squares regression were used to establish superoxide dismutase activity value regression model. The results indicated that MF-CatBoost-LightGBM was the best model for peanut seed vigor classification, and the accuracy result was 90.83%. MSC-CatBoost-PLSR was the optimal regression model of superoxide dismutase activity value. The results show that the R2 was 0.9787 and the RMSE value was 0.0566. The results suggested that hyperspectral technology could correlate the external manifestation of effective peanut seed vigor.
Collapse
Affiliation(s)
- Zhiyong Zou
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Jie Chen
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Weijia Wu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Jinghao Luo
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Tao Long
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Qingsong Wu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Qianlong Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Jiangbo Zhen
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Yongpeng Zhao
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Yuchao Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Yongming Chen
- School of Electrical Engineering and Automation, Hubei Normal University, Huangshi, Hubei, China
| | - Man Zhou
- Food Academy, Sichuan Agricultural University, Yaan, China
| | - Lijia Xu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| |
Collapse
|
24
|
Feng H, Jin D, Li J, Li Y, Zou Q, Liu T. Matrix reconstruction with reliable neighbors for predicting potential MiRNA-disease associations. Brief Bioinform 2023; 24:6960615. [PMID: 36567252 DOI: 10.1093/bib/bbac571] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 10/16/2022] [Accepted: 11/23/2022] [Indexed: 12/27/2022] Open
Abstract
Numerous experimental studies have indicated that alteration and dysregulation in mircroRNAs (miRNAs) are associated with serious diseases. Identifying disease-related miRNAs is therefore an essential and challenging task in bioinformatics research. Computational methods are an efficient and economical alternative to conventional biomedical studies and can reveal underlying miRNA-disease associations for subsequent experimental confirmation with reasonable confidence. Despite the success of existing computational approaches, most of them only rely on the known miRNA-disease associations to predict associations without adding other data to increase the prediction accuracy, and they are affected by issues of data sparsity. In this paper, we present MRRN, a model that combines matrix reconstruction with node reliability to predict probable miRNA-disease associations. In MRRN, the most reliable neighbors of miRNA and disease are used to update the original miRNA-disease association matrix, which significantly reduces data sparsity. Unknown miRNA-disease associations are reconstructed by aggregating the most reliable first-order neighbors to increase prediction accuracy by representing the local and global structure of the heterogeneous network. Five-fold cross-validation of MRRN produced an area under the curve (AUC) of 0.9355 and area under the precision-recall curve (AUPR) of 0.2646, values that were greater than those produced by comparable models. Two different types of case studies using three diseases were conducted to demonstrate the accuracy of MRRN, and all top 30 predicted miRNAs were verified.
Collapse
Affiliation(s)
- Hailin Feng
- School of mathematics and computer science, Zhejiang A&F University, No.666 Wusu Street,Lin'an District, 311300, Hangzhou, China
| | - Dongdong Jin
- School of mathematics and computer science, Zhejiang A&F University, No.666 Wusu Street,Lin'an District, 311300, Hangzhou, China
| | - Jian Li
- School of mathematics and computer science, Zhejiang A&F University, No.666 Wusu Street,Lin'an District, 311300, Hangzhou, China
| | - Yane Li
- School of mathematics and computer science, Zhejiang A&F University, No.666 Wusu Street,Lin'an District, 311300, Hangzhou, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 2006, Xiyuan Avenue, West District, high tech Zone, 611731, Chengdu, China
| | - Tongcun Liu
- School of mathematics and computer science, Zhejiang A&F University, No.666 Wusu Street,Lin'an District, 311300, Hangzhou, China
| |
Collapse
|
25
|
DeepCF-PPI: improved prediction of protein-protein interactions by combining learned and handcrafted features based on attention mechanisms. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04387-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
26
|
Jabeer A, Temiz M, Bakir-Gungor B, Yousef M. miRdisNET: Discovering microRNA biomarkers that are associated with diseases utilizing biological knowledge-based machine learning. Front Genet 2023; 13:1076554. [PMID: 36712859 PMCID: PMC9877296 DOI: 10.3389/fgene.2022.1076554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 12/30/2022] [Indexed: 01/14/2023] Open
Abstract
During recent years, biological experiments and increasing evidence have shown that microRNAs play an important role in the diagnosis and treatment of human complex diseases. Therefore, to diagnose and treat human complex diseases, it is necessary to reveal the associations between a specific disease and related miRNAs. Although current computational models based on machine learning attempt to determine miRNA-disease associations, the accuracy of these models need to be improved, and candidate miRNA-disease relations need to be evaluated from a biological perspective. In this paper, we propose a computational model named miRdisNET to predict potential miRNA-disease associations. Specifically, miRdisNET requires two types of data, i.e., miRNA expression profiles and known disease-miRNA associations as input files. First, we generate subsets of specific diseases by applying the grouping component. These subsets contain miRNA expressions with class labels associated with each specific disease. Then, we assign an importance score to each group by using a machine learning method for classification. Finally, we apply a modeling component and obtain outputs. One of the most important outputs of miRdisNET is the performance of miRNA-disease prediction. Compared with the existing methods, miRdisNET obtained the highest AUC value of .9998. Another output of miRdisNET is a list of significant miRNAs for disease under study. The miRNAs identified by miRdisNET are validated via referring to the gold-standard databases which hold information on experimentally verified microRNA-disease associations. miRdisNET has been developed to predict candidate miRNAs for new diseases, where miRNA-disease relation is not yet known. In addition, miRdisNET presents candidate disease-disease associations based on shared miRNA knowledge. The miRdisNET tool and other supplementary files are publicly available at: https://github.com/malikyousef/miRdisNET.
Collapse
Affiliation(s)
- Amhar Jabeer
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Mustafa Temiz
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
| |
Collapse
|
27
|
Liao Q, Ye Y, Li Z, Chen H, Zhuo L. Prediction of miRNA-disease associations in microbes based on graph convolutional networks and autoencoders. Front Microbiol 2023; 14:1170559. [PMID: 37187536 PMCID: PMC10175670 DOI: 10.3389/fmicb.2023.1170559] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 03/21/2023] [Indexed: 05/17/2023] Open
Abstract
MicroRNAs (miRNAs) are short RNA molecular fragments that regulate gene expression by targeting and inhibiting the expression of specific RNAs. Due to the fact that microRNAs affect many diseases in microbial ecology, it is necessary to predict microRNAs' association with diseases at the microbial level. To this end, we propose a novel model, termed as GCNA-MDA, where dual-autoencoder and graph convolutional network (GCN) are integrated to predict miRNA-disease association. The proposed method leverages autoencoders to extract robust representations of miRNAs and diseases and meantime exploits GCN to capture the topological information of miRNA-disease networks. To alleviate the impact of insufficient information for the original data, the association similarity and feature similarity data are combined to calculate a more complete initial basic vector of nodes. The experimental results on the benchmark datasets demonstrate that compared with the existing representative methods, the proposed method has achieved the superior performance and its precision reaches up to 0.8982. These results demonstrate that the proposed method can serve as a tool for exploring miRNA-disease associations in microbial environments.
Collapse
Affiliation(s)
- Qingquan Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Yuxiang Ye
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Zihang Li
- School of Computing and Data Science, Xiamen University Malaysia, Sepang, Selangor, Malaysia
| | - Hao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- *Correspondence: Hao Chen
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
- Linlin Zhuo
| |
Collapse
|
28
|
Pang S, Zhuang Y, Qiao S, Wang F, Wang S, Lv Z. DCTGM: A Novel Dual-channel Transformer Graph Model for miRNA-disease Association Prediction. Cognit Comput 2022. [DOI: 10.1007/s12559-022-10092-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
29
|
SGAEMDA: Predicting miRNA-Disease Associations Based on Stacked Graph Autoencoder. Cells 2022; 11:cells11243984. [PMID: 36552748 PMCID: PMC9776508 DOI: 10.3390/cells11243984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 11/30/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
MicroRNA (miRNA)-disease association (MDA) prediction is critical for disease prevention, diagnosis, and treatment. Traditional MDA wet experiments, on the other hand, are inefficient and costly.Therefore, we proposed a multi-layer collaborative unsupervised training base model called SGAEMDA (Stacked Graph Autoencoder-Based Prediction of Potential miRNA-Disease Associations). First, from the original miRNA and disease data, we defined two types of initial features: similarity features and association features. Second, stacked graph autoencoder is then used to learn unsupervised low-dimensional representations of meaningful higher-order similarity features, and we concatenate the association features with the learned low-dimensional representations to obtain the final miRNA-disease pair features. Finally, we used a multilayer perceptron (MLP) to predict scores for unknown miRNA-disease associations. SGAEMDA achieved a mean area under the ROC curve of 0.9585 and 0.9516 in 5-fold and 10-fold cross-validation, which is significantly higher than the other baseline methods. Furthermore, case studies have shown that SGAEMDA can accurately predict candidate miRNAs for brain, breast, colon, and kidney neoplasms.
Collapse
|
30
|
Guo R, Chen H, Wang W, Wu G, Lv F. Predicting potential miRNA-disease associations based on more reliable negative sample selection. BMC Bioinformatics 2022; 23:432. [PMID: 36253735 PMCID: PMC9575264 DOI: 10.1186/s12859-022-04978-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 10/06/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Increasing biomedical studies have shown that the dysfunction of miRNAs is closely related with many human diseases. Identifying disease-associated miRNAs would contribute to the understanding of pathological mechanisms of diseases. Supervised learning-based computational methods have continuously been developed for miRNA-disease association predictions. Negative samples of experimentally-validated uncorrelated miRNA-disease pairs are required for these approaches, while they are not available due to lack of biomedical research interest. Existing methods mainly choose negative samples from the unlabelled ones randomly. Therefore, the selection of more reliable negative samples is of great importance for these methods to achieve satisfactory prediction results. RESULTS In this study, we propose a computational method termed as KR-NSSM which integrates two semi-supervised algorithms to select more reliable negative samples for miRNA-disease association predictions. Our method uses a refined K-means algorithm for preliminary screening of likely negative and positive miRNA-disease samples. A Rocchio classification-based method is applied for further screening to receive more reliable negative and positive samples. We implement ablation tests in KR-NSSM and find that the combination of the two selection procedures would obtain more reliable negative samples for miRNA-disease association predictions. Comprehensive experiments based on fivefold cross-validations demonstrate improvements in prediction accuracy on six classic classifiers and five known miRNA-disease association prediction models when using negative samples chose by our method than by previous negative sample selection strategies. Moreover, 469 out of 1123 selected positive miRNA-disease associations by our method are confirmed by existing databases. CONCLUSIONS Our experiments show that KR-NSSM can screen out more reliable negative samples from the unlabelled ones, which greatly improves the performance of supervised machine learning methods in miRNA-disease association predictions. We expect that KR-NSSM would be a useful tool in negative sample selection in biomedical research.
Collapse
Affiliation(s)
- Ruiyu Guo
- School of Software, East China Jiaotong University, Nanchang, 330013, China
| | - Hailin Chen
- School of Software, East China Jiaotong University, Nanchang, 330013, China.
| | - Wengang Wang
- School of Software, East China Jiaotong University, Nanchang, 330013, China
| | - Guangsheng Wu
- School of Mathematics and Computer Science, Xinyu University, Xinyu, 338004, China
| | - Fangliang Lv
- School of Software, East China Jiaotong University, Nanchang, 330013, China
| |
Collapse
|
31
|
Li M, Fan Y, Zhang Y, Lv Z. Using Sequence Similarity Based on CKSNP Features and a Graph Neural Network Model to Identify miRNA-Disease Associations. Genes (Basel) 2022; 13:1759. [PMID: 36292644 PMCID: PMC9602123 DOI: 10.3390/genes13101759] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 01/12/2024] Open
Abstract
Among many machine learning models for analyzing the relationship between miRNAs and diseases, the prediction results are optimized by establishing different machine learning models, and less attention is paid to the feature information contained in the miRNA sequence itself. This study focused on the impact of the different feature information of miRNA sequences on the relationship between miRNA and disease. It was found that when the graph neural network used was the same and the miRNA features based on the K-spacer nucleic acid pair composition (CKSNAP) feature were adopted, a better graph neural network prediction model of miRNA-disease relationship could be built (AUC = 93.71%), which was 0.15% greater than the best model in the literature based on the same benchmark dataset. The optimized model was also used to predict miRNAs related to lung tumors, esophageal tumors, and kidney tumors, and 47, 47, and 37 of the top 50 miRNAs related to three diseases predicted separately by the model were consistent with descriptions in the wet experiment validation database (dbDEMC).
Collapse
Affiliation(s)
- Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Yu Fan
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Yiting Zhang
- College of Biology, Southwest Jiaotong University, Chengdu 611756, China
- College of Biology, Georgia State University, Atlanta, GA 30302-3965, USA
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
32
|
Wei Z, Yao D, Zhan X, Zhang S. A clustering-based sampling method for miRNA-disease association prediction. Front Genet 2022; 13:995535. [PMID: 36176298 PMCID: PMC9513605 DOI: 10.3389/fgene.2022.995535] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
More and more studies have proved that microRNAs (miRNAs) play a critical role in gene expression regulation, and the irregular expression of miRNAs tends to be associated with a variety of complex human diseases. Because of the high cost and low efficiency of identifying disease-associated miRNAs through biological experiments, scholars have focused on predicting potential disease-associated miRNAs by computational methods. Considering that the existing methods are flawed in constructing negative sample set, we proposed a clustering-based sampling method for miRNA-disease association prediction (CSMDA). Firstly, we integrated multiple similarity information of miRNA and disease to represent miRNA-disease pairs. Secondly, we performed a clustering-based sampling method to avoid introducing potential positive samples when constructing negative sample set. Thirdly, we employed a random forest-based feature selection method to reduce noise and redundant information in the high-dimensional feature space. Finally, we implemented an ensemble learning framework for predicting miRNA-disease associations by soft voting. The Precision, Recall, F1-score, AUROC and AUPR of the CSMDA achieved 0.9676, 0.9545, 0.9610, 0.9928, and 0.9940, respectively, under five-fold cross-validation. Besides, case study on three cancers showed that the top 20 potentially associated miRNAs predicted by the CSMDA were confirmed by the dbDEMC database or literatures. The above results demonstrate that the CSMDA can predict potential disease-associated miRNAs more accurately.
Collapse
Affiliation(s)
- Zheng Wei
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
- *Correspondence: Dengju Yao,
| | - Xiaojuan Zhan
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China
| | - Shuli Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| |
Collapse
|
33
|
Ma M, Na S, Zhang X, Chen C, Xu J. SFGAE: a self-feature-based graph autoencoder model for miRNA-disease associations prediction. Brief Bioinform 2022; 23:6678419. [PMID: 36037084 DOI: 10.1093/bib/bbac340] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 07/21/2022] [Accepted: 07/25/2022] [Indexed: 11/13/2022] Open
Abstract
Increasing evidence has suggested that microRNAs (miRNAs) are important biomarkers of various diseases. Numerous graph neural network (GNN) models have been proposed for predicting miRNA-disease associations. However, the existing GNN-based methods have over-smoothing issue-the learned feature embeddings of miRNA nodes and disease nodes are indistinguishable when stacking multiple GNN layers. This issue makes the performance of the methods sensitive to the number of layers, and significantly hurts the performance when more layers are employed. In this study, we resolve this issue by a novel self-feature-based graph autoencoder model, shortened as SFGAE. The key novelty of SFGAE is to construct miRNA-self embeddings and disease-self embeddings, and let them be independent of graph interactions between two types of nodes. The novel self-feature embeddings enrich the information of typical aggregated feature embeddings, which aggregate the information from direct neighbors and hence heavily rely on graph interactions. SFGAE adopts a graph encoder with attention mechanism to concatenate aggregated feature embeddings and self-feature embeddings, and adopts a bilinear decoder to predict links. Our experiments show that SFGAE achieves state-of-the-art performance. In particular, SFGAE improves the average AUC upon recent GAEMDA [1] on the benchmark datasets HMDD v2.0 and HMDD v3.2, and consistently performs better when less (e.g. 10%) training samples are used. Furthermore, SFGAE effectively overcomes the over-smoothing issue and performs stably well on deeper models (e.g. eight layers). Finally, we carry out case studies on three human diseases, colon neoplasms, esophageal neoplasms and kidney neoplasms, and perform a survival analysis using kidney neoplasm as an example. The results suggest that SFGAE is a reliable tool for predicting potential miRNA-disease associations.
Collapse
Affiliation(s)
- Mingyuan Ma
- Key Laboratory of High Confidence Software Technologies of Ministry of Education, School of Computer Science, Peking University, Beijing, China
| | - Sen Na
- International Computer Science Institute and Department of Statistics, University of California, Berkeley, Berkeley CA, USA
| | - Xiaolu Zhang
- Department of Information Systems, City University of Hong Kong, Hong Kong, China
| | - Congzhou Chen
- Key Laboratory of High Confidence Software Technologies of Ministry of Education, School of Computer Science, Peking University, Beijing, China
| | - Jin Xu
- Key Laboratory of High Confidence Software Technologies of Ministry of Education, School of Computer Science, Peking University, Beijing, China
| |
Collapse
|
34
|
Yang M, Huang ZA, Gu W, Han K, Pan W, Yang X, Zhu Z. Prediction of biomarker-disease associations based on graph attention network and text representation. Brief Bioinform 2022; 23:6651308. [PMID: 35901464 DOI: 10.1093/bib/bbac298] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 06/28/2022] [Accepted: 06/30/2022] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION The associations between biomarkers and human diseases play a key role in understanding complex pathology and developing targeted therapies. Wet lab experiments for biomarker discovery are costly, laborious and time-consuming. Computational prediction methods can be used to greatly expedite the identification of candidate biomarkers. RESULTS Here, we present a novel computational model named GTGenie for predicting the biomarker-disease associations based on graph and text features. In GTGenie, a graph attention network is utilized to characterize diverse similarities of biomarkers and diseases from heterogeneous information resources. Meanwhile, a pretrained BERT-based model is applied to learn the text-based representation of biomarker-disease relation from biomedical literature. The captured graph and text features are then integrated in a bimodal fusion network to model the hybrid entity representation. Finally, inductive matrix completion is adopted to infer the missing entries for reconstructing relation matrix, with which the unknown biomarker-disease associations are predicted. Experimental results on HMDD, HMDAD and LncRNADisease data sets showed that GTGenie can obtain competitive prediction performance with other state-of-the-art methods. AVAILABILITY The source code of GTGenie and the test data are available at: https://github.com/Wolverinerine/GTGenie.
Collapse
Affiliation(s)
- Minghao Yang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China
| | - Zhi-An Huang
- Center for Computer Science and Information Technology, City University of Hong Kong Dongguan Research Institute, Dongguan, China
| | - Wenhao Gu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China.,GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Kun Han
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Wenying Pan
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Xiao Yang
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China
| |
Collapse
|
35
|
Zhang W, Wei H, Liu B. idenMD-NRF: a ranking framework for miRNA-disease association identification. Brief Bioinform 2022; 23:6604995. [PMID: 35679537 DOI: 10.1093/bib/bbac224] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/18/2022] [Accepted: 05/11/2022] [Indexed: 11/12/2022] Open
Abstract
Identifying miRNA-disease associations is an important task for revealing pathogenic mechanism of complicated diseases. Different computational methods have been proposed. Although these methods obtained encouraging performance for detecting missing associations between known miRNAs and diseases, how to accurately predict associated diseases for new miRNAs is still a difficult task. In this regard, a ranking framework named idenMD-NRF is proposed for miRNA-disease association identification. idenMD-NRF treats the miRNA-disease association identification as an information retrieval task. Given a novel query miRNA, idenMD-NRF employs Learning to Rank algorithm to rank associated diseases based on high-level association features and various predictors. The experimental results on two independent test datasets indicate that idenMD-NRF is superior to other compared predictors. A user-friendly web server of idenMD-NRF predictor is freely available at http://bliulab.net/idenMD-NRF/.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Hang Wei
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
36
|
Xu L, Li X, Yang Q, Tan L, Liu Q, Liu Y. Application of Bidirectional Generative Adversarial Networks to Predict Potential miRNAs Associated With Diseases. Front Genet 2022; 13:936823. [PMID: 35903359 PMCID: PMC9314862 DOI: 10.3389/fgene.2022.936823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 06/08/2022] [Indexed: 11/18/2022] Open
Abstract
Substantial evidence has shown that microRNAs are crucial for biological processes within complex human diseases. Identifying the association of miRNA–disease pairs will contribute to accelerating the discovery of potential biomarkers and pathogenesis. Researchers began to focus on constructing computational models to facilitate the progress of disease pathology and clinical medicine by identifying the potential disease-related miRNAs. However, most existing computational methods are expensive, and their use is limited to unobserved relationships for unknown miRNAs (diseases) without association information. In this manuscript, we proposed a creatively semi-supervised model named bidirectional generative adversarial network for miRNA-disease association prediction (BGANMDA). First, we constructed a microRNA similarity network, a disease similarity network, and Gaussian interaction profile kernel similarity based on the known miRNA–disease association and comprehensive similarity of miRNAs (diseases). Next, an integrated similarity feature network with the full underlying relationships of miRNA–disease pairwise was obtained. Then, the similarity feature network was fed into the BGANMDA model to learn advanced traits in latent space. Finally, we ranked an association score list and predicted the associations between miRNA and disease. In our experiment, a five-fold cross validation was applied to estimate BGANMDA’s performance, and an area under the curve (AUC) of 0.9319 and a standard deviation of 0.00021 were obtained. At the same time, in the global and local leave-one-out cross validation (LOOCV), the AUC value and standard deviation of BGANMDA were 0.9116 ± 0.0025 and 0.8928 ± 0.0022, respectively. Furthermore, BGANMDA was employed in three different case studies to validate its prediction capability and accuracy. The experimental results of the case studies showed that 46, 46, and 48 of the top 50 prediction lists had been identified in previous studies.
Collapse
Affiliation(s)
- Long Xu
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Xiaokun Li
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Heilongjiang University, Harbin, China
- *Correspondence: Xiaokun Li, ; Yong Liu,
| | - Qiang Yang
- School of Electronic Engineering, Heilongjiang University, Harbin, China
| | - Long Tan
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Qingyuan Liu
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Heilongjiang University, Harbin, China
| | - Yong Liu
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
- *Correspondence: Xiaokun Li, ; Yong Liu,
| |
Collapse
|
37
|
Liu YF, Shu X, Qiao XF, Ai GY, Liu L, Liao J, Qian S, He XJ. Radiomics-Based Machine Learning Models for Predicting P504s/P63 Immunohistochemical Expression: A Noninvasive Diagnostic Tool for Prostate Cancer. Front Oncol 2022; 12:911426. [PMID: 35795067 PMCID: PMC9252170 DOI: 10.3389/fonc.2022.911426] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 05/19/2022] [Indexed: 01/31/2023] Open
Abstract
Objective To develop and validate a noninvasive radiomic-based machine learning (ML) model to identify P504s/P63 status and further achieve the diagnosis of prostate cancer (PCa). Methods A retrospective dataset of patients with preoperative prostate MRI examination and P504s/P63 pathological immunohistochemical results between June 2016 and February 2021 was conducted. As indicated by P504s/P63 expression, the patients were divided into label 0 (atypical prostatic hyperplasia), label 1 (benign prostatic hyperplasia, BPH) and label 2 (PCa) groups. This study employed T2WI, DWI and ADC sequences to assess prostate diseases and manually segmented regions of interest (ROIs) with Artificial Intelligence Kit software for radiomics feature acquisition. Feature dimensionality reduction and selection were performed by using a mutual information algorithm. Based on screened features, P504s/P63 prediction models were established by random forest (RF), gradient boosting decision tree (GBDT), logistic regression (LR), adaptive boosting (AdaBoost) and k-nearest neighbor (KNN) algorithms. The performance was evaluated by the area under the ROC curve (AUC) and accuracy. Results A total of 315 patients were enrolled. Among the 851 radiomic features, the 32 top features were derived from T2WI, in which the gray-level run length matrix (GLRLM) and gray-level cooccurrence matrix (GLCM) features accounted for the largest proportion. Among the five models, the RF algorithm performed best in general evaluations (microaverage AUC=0.920, macroaverage AUC=0.870) and provided the most accurate result in further sublabel prediction (the accuracies of label 0, 1, and 2 were 0.831, 0.831, and 0.932, respectively). In comparative sequence analyses, T2WI was the best single-sequence candidate (microaverage AUC=0.94 and macroaverage AUC=0.78). The merged datasets of T2WI, DWI, and ADC yielded optimal AUCs (microaverage AUC=0.930 and macroaverage AUC=0.900). Conclusions The radiomic-based RF classifier has the potential to be used to evaluate the presurgical P504s/P63 status and further diagnose PCa noninvasively and accurately.
Collapse
Affiliation(s)
- Yun-Fan Liu
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Xin Shu
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Xiao-Feng Qiao
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Guang-Yong Ai
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Li Liu
- Big Data and Software Engineering College, Chongqing University, Chongqing, China
| | - Jun Liao
- Big Data and Software Engineering College, Chongqing University, Chongqing, China
| | - Shuang Qian
- Big Data and Software Engineering College, Chongqing University, Chongqing, China
| | - Xiao-Jing He
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
- *Correspondence: Xiao-Jing He,
| |
Collapse
|
38
|
Lou Z, Cheng Z, Li H, Teng Z, Liu Y, Tian Z. Predicting miRNA-disease associations via learning multimodal networks and fusing mixed neighborhood information. Brief Bioinform 2022; 23:6582005. [PMID: 35524503 DOI: 10.1093/bib/bbac159] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 03/29/2022] [Accepted: 04/10/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION In recent years, a large number of biological experiments have strongly shown that miRNAs play an important role in understanding disease pathogenesis. The discovery of miRNA-disease associations is beneficial for disease diagnosis and treatment. Since inferring these associations through biological experiments is time-consuming and expensive, researchers have sought to identify the associations utilizing computational approaches. Graph Convolutional Networks (GCNs), which exhibit excellent performance in link prediction problems, have been successfully used in miRNA-disease association prediction. However, GCNs only consider 1st-order neighborhood information at one layer but fail to capture information from high-order neighbors to learn miRNA and disease representations through information propagation. Therefore, how to aggregate information from high-order neighborhood effectively in an explicit way is still challenging. RESULTS To address such a challenge, we propose a novel method called mixed neighborhood information for miRNA-disease association (MINIMDA), which could fuse mixed high-order neighborhood information of miRNAs and diseases in multimodal networks. First, MINIMDA constructs the integrated miRNA similarity network and integrated disease similarity network respectively with their multisource information. Then, the embedding representations of miRNAs and diseases are obtained by fusing mixed high-order neighborhood information from multimodal network which are the integrated miRNA similarity network, integrated disease similarity network and the miRNA-disease association networks. Finally, we concentrate the multimodal embedding representations of miRNAs and diseases and feed them into the multilayer perceptron (MLP) to predict their underlying associations. Extensive experimental results show that MINIMDA is superior to other state-of-the-art methods overall. Moreover, the outstanding performance on case studies for esophageal cancer, colon tumor and lung cancer further demonstrates the effectiveness of MINIMDA. AVAILABILITY AND IMPLEMENTATION https://github.com/chengxu123/MINIMDA and http://120.79.173.96/.
Collapse
Affiliation(s)
- Zhengzheng Lou
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Zhaoxu Cheng
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Hui Li
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Zhixia Teng
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Yang Liu
- Departments of Cerebrovascular Diseases, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou 450000, China
| | - Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| |
Collapse
|
39
|
Ding Y, Lei X, Liao B, Wu FX. MLRDFM: a multi-view Laplacian regularized DeepFM model for predicting miRNA-disease associations. Brief Bioinform 2022; 23:6552270. [PMID: 35323901 DOI: 10.1093/bib/bbac079] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 02/07/2022] [Accepted: 02/15/2022] [Indexed: 01/20/2023] Open
Abstract
MOTIVATION MicroRNAs (miRNAs), as critical regulators, are involved in various fundamental and vital biological processes, and their abnormalities are closely related to human diseases. Predicting disease-related miRNAs is beneficial to uncovering new biomarkers for the prevention, detection, prognosis, diagnosis and treatment of complex diseases. RESULTS In this study, we propose a multi-view Laplacian regularized deep factorization machine (DeepFM) model, MLRDFM, to predict novel miRNA-disease associations while improving the standard DeepFM. Specifically, MLRDFM improves DeepFM from two aspects: first, MLRDFM takes the relationships among items into consideration by regularizing their embedding features via their similarity-based Laplacians. In this study, miRNA Laplacian regularization integrates four types of miRNA similarity, while disease Laplacian regularization integrates two types of disease similarity. Second, to judiciously train our model, Laplacian eigenmaps are utilized to initialize the weights in the dense embedding layer. The experimental results on the latest HMDD v3.2 dataset show that MLRDFM improves the performance and reduces the overfitting phenomenon of DeepFM. Besides, MLRDFM is greatly superior to the state-of-the-art models in miRNA-disease association prediction in terms of different evaluation metrics with the 5-fold cross-validation. Furthermore, case studies further demonstrate the effectiveness of MLRDFM.
Collapse
Affiliation(s)
- Yulian Ding
- Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Drive, S7N 5A9, Saskatchewan, Canada
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, 620 West Chang'an Avenue, 710119, Shaanxi, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, 99 Longkun South Road, 571158, Hainan, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Drive, S7N 5A9, Saskatchewan, Canada.,Department of Mechanical Engineering and Department of Computer Science, University of Saskatchewan, 57 Campus Drive, S7N5A9, Saskatchewan, Canada
| |
Collapse
|
40
|
A miRNA-Disease Association Identification Method Based on Reliable Negative Sample Selection and Improved Single-Hidden Layer Feedforward Neural Network. INFORMATION 2022. [DOI: 10.3390/info13030108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
miRNAs are a category of important endogenous non-coding small RNAs and are ubiquitous in eukaryotes. They are widely involved in the regulatory process of post-transcriptional gene expression and play a critical part in the development of human diseases. By utilizing recent advancements in big data technology, using bioinformatics methods to identify causative miRNA becomes a hot spot. In this paper, a method called RNSSLFN is proposed to identify the miRNA-disease associations by reliable negative sample selection and an improved single-hidden layer feedforward neural network (SLFN). It involves, firstly, obtaining integrated similarity for miRNAs and diseases; next, selecting reliable negative samples from unknown miRNA-disease associations via distinguishing up-regulated or down-regulated miRNAs; then, introducing an improved SLFN to solve the prediction task. The experimental results on the latest data sets HMDD v3.2 and the framework of 5-fold cross-validation (CV) show that the average AUC and AUPR of RNSSLFN achieve 0.9316 and 0.9065 m, respectively, which are superior to the other three state-of-the-art methods. Furthermore, in the case studies of 10 common cancers, more than 70% of the top 30 predicted miRNA-disease association pairs are verified in the databases, which further confirms the reliability and effectiveness of the RNSSLFN model. Generally, RNSSLFN in predicting miRNA-disease associations has prodigious potential and extensive foreground.
Collapse
|
41
|
Dai Q, Wang Z, Liu Z, Duan X, Song J, Guo M. Predicting miRNA-disease associations using an ensemble learning framework with resampling method. Brief Bioinform 2021; 23:6470964. [PMID: 34929742 DOI: 10.1093/bib/bbab543] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 11/05/2021] [Accepted: 11/25/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Accumulating evidences have indicated that microRNA (miRNA) plays a crucial role in the pathogenesis and progression of various complex diseases. Inferring disease-associated miRNAs is significant to explore the etiology, diagnosis and treatment of human diseases. As the biological experiments are time-consuming and labor-intensive, developing effective computational methods has become indispensable to identify associations between miRNAs and diseases. RESULTS We present an Ensemble learning framework with Resampling method for MiRNA-Disease Association (ERMDA) prediction to discover potential disease-related miRNAs. Firstly, the resampling strategy is proposed for building multiple different balanced training subsets to address the challenge of sample imbalance within the database. Then, ERMDA extracts miRNA and disease feature representations by integrating miRNA-miRNA similarities, disease-disease similarities and experimentally verified miRNA-disease association information. Next, the feature selection approach is applied to reduce the redundant information and increase the diversity among these subsets. Lastly, ERMDA constructs an individual learner on each subset to yield primitive outcomes, and the soft voting method is introduced for making the final decision based on the prediction results of individual learners. A series of experimental results demonstrates that ERMDA outperforms other state-of-the-art methods on both balanced and unbalanced testing sets. Besides, case studies conducted on the three human diseases further confirm the ERMDA's prediction capability for identifying potential disease-related miRNAs. In conclusion, these experimental results demonstrate that our method can serve as an effective and reliable tool for researchers to explore the regulatory role of miRNAs in complex diseases.
Collapse
Affiliation(s)
- Qiguo Dai
- School of Computer Science and Engineering, Dalian Minzu University, 116600, Dalian, China.,SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, 116600, Dalian, China
| | - Zhaowei Wang
- School of Computer Science and Engineering, Dalian Minzu University, 116600, Dalian, China.,SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, 116600, Dalian, China
| | - Ziqiang Liu
- School of Computer Science and Engineering, Dalian Minzu University, 116600, Dalian, China.,SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, 116600, Dalian, China
| | - Xiaodong Duan
- SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, 116600, Dalian, China
| | - Jinmiao Song
- SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, 116600, Dalian, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, 100044, Beijing, China
| |
Collapse
|
42
|
Uthayopas K, de Sá AGC, Alavi A, Pires DEV, Ascher DB. TSMDA: Target and symptom-based computational model for miRNA-disease-association prediction. MOLECULAR THERAPY. NUCLEIC ACIDS 2021; 26:536-546. [PMID: 34631283 PMCID: PMC8479276 DOI: 10.1016/j.omtn.2021.08.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/19/2021] [Indexed: 02/06/2023]
Abstract
The emergence of high-throughput sequencing techniques has revealed a primary role of microRNAs (miRNAs) in a wide range of diseases, including cancers and neurodegenerative disorders. Understanding novel relationships between miRNAs and diseases can potentially unveil complex pathogenesis mechanisms, leading to effective diagnosis and treatment. The investigation of novel miRNA-disease associations, however, is currently costly and time consuming. Over the years, several computational models have been proposed to prioritize potential miRNA-disease associations, but with limited usability or predictive capability. In order to fill this gap, we introduce TSMDA, a novel machine-learning method that leverages target and symptom information and negative sample selection to predict miRNA-disease association. TSMDA significantly outperforms similar methods, achieving an area under the receiver operating characteristic (ROC) curve (AUC) of 0.989 and 0.982 under 5-fold cross-validation and blind test, respectively. We also demonstrate the capability of the method to uncover potential miRNA-disease associations in breast, prostate, and lung cancers, as case studies. We believe TSMDA will be an invaluable tool for the community to explore and prioritize potentially new miRNA-disease associations for further experimental characterization. The method was made available as a freely accessible and user-friendly web interface at http://biosig.unimelb.edu.au/tsmda/.
Collapse
Affiliation(s)
- Korawich Uthayopas
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia
| | - Alex G C de Sá
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, VIC, Australia
| | - Azadeh Alavi
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, VIC, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, VIC, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, UK
| |
Collapse
|
43
|
Ramesh P, Veerappapillai S. Prediction of Micronucleus Assay Outcome Using In Vivo Activity Data and Molecular Structure Features. Appl Biochem Biotechnol 2021; 193:4018-4034. [PMID: 34669110 DOI: 10.1007/s12010-021-03720-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/08/2021] [Indexed: 11/28/2022]
Abstract
In vivo micronucleus assay is the widely used genotoxic test to determine the extent of chromosomal aberrations caused by the chemicals in human beings, which plays a significant role in the drug discovery paradigm. To reduce the uncertainties of the in vivo experiments and the expenses, we intended to develop novel machine learning-based tools to predict the toxicity of the compounds with high precision. A total of 372 compounds with known toxicity information were retrieved from the PubChem Bioassay database and literature. The fingerprints and descriptors of the compounds were generated using PaDEL and ChemSAR, respectively, for the analysis. The performance of the models was assessed using the three tires of evaluation strategies such as fivefold, tenfold, and validation by external dataset. Further, structural alerts causing genotoxicity of the compounds were identified using SARpy method. Of note, fingerprint-based random forest model built in our analysis is able to demonstrate the highest accuracy of about 0.97 during tenfold cross-validation. In essence, our study highlights that structural alerts such as chlorocyclohexane and trimethylamine are likely to be the leading cause of toxicity in humans. Indeed, we believe that random forest model generated in this study is appropriate for reduction of test animals and should be considered in the future for the good practice of animal welfare.
Collapse
Affiliation(s)
- Priyanka Ramesh
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Shanthi Veerappapillai
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India.
| |
Collapse
|
44
|
Wang YT, Li L, Ji CM, Zheng CH, Ni JC. ILPMDA: Predicting miRNA-Disease Association Based on Improved Label Propagation. Front Genet 2021; 12:743665. [PMID: 34659364 PMCID: PMC8514753 DOI: 10.3389/fgene.2021.743665] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 08/30/2021] [Indexed: 12/21/2022] Open
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that have been demonstrated to be related to numerous complex human diseases. Considerable studies have suggested that miRNAs affect many complicated bioprocesses. Hence, the investigation of disease-related miRNAs by utilizing computational methods is warranted. In this study, we presented an improved label propagation for miRNA-disease association prediction (ILPMDA) method to observe disease-related miRNAs. First, we utilized similarity kernel fusion to integrate different types of biological information for generating miRNA and disease similarity networks. Second, we applied the weighted k-nearest known neighbor algorithm to update verified miRNA-disease association data. Third, we utilized improved label propagation in disease and miRNA similarity networks to make association prediction. Furthermore, we obtained final prediction scores by adopting an average ensemble method to integrate the two kinds of prediction results. To evaluate the prediction performance of ILPMDA, two types of cross-validation methods and case studies on three significant human diseases were implemented to determine the accuracy and effectiveness of ILPMDA. All results demonstrated that ILPMDA had the ability to discover potential miRNA-disease associations.
Collapse
Affiliation(s)
- Yu-Tian Wang
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Lei Li
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Cun-Mei Ji
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, Hefei, China
| | - Jian-Cheng Ni
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| |
Collapse
|
45
|
Dai Q, Chu Y, Li Z, Zhao Y, Mao X, Wang Y, Xiong Y, Wei DQ. MDA-CF: Predicting MiRNA-Disease associations based on a cascade forest model by fusing multi-source information. Comput Biol Med 2021; 136:104706. [PMID: 34371319 DOI: 10.1016/j.compbiomed.2021.104706] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/26/2021] [Accepted: 07/26/2021] [Indexed: 01/17/2023]
Abstract
MicroRNAs (miRNAs) are significant regulators in various biological processes. They may become promising biomarkers or therapeutic targets, which provide a new perspective in diagnosis and treatment of multiple diseases. Since the experimental methods are always costly and resource-consuming, prediction of disease-related miRNAs using computational methods is in great need. In this study, we developed MDA-CF to identify underlying miRNA-disease associations based on a cascade forest model. In this method, multi-source information was integrated to represent miRNAs and diseases comprehensively, and the autoencoder was utilized for dimension reduction to obtain the optimal feature space. The cascade forest model was then employed for miRNA-disease association prediction. As a result, the average AUC of MDA-CF was 0.9464 on HMDD v3.2 in five-fold cross-validation. Compared with previous computational methods, MDA-CF performed better on HMDD v2.0 with an average AUC of 0.9258. Moreover, MDA-CF was implemented to investigate colon neoplasm, breast neoplasm, and gastric neoplasm, and 100%, 86%, 88% of the top 50 potential miRNAs were validated by authoritative databases. In conclusion, MDA-CF appears to be a reliable method to uncover disease-associated miRNAs. The source code of MDA-CF is available at https://github.com/a1622108/MDA-CF.
Collapse
Affiliation(s)
- Qiuying Dai
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhiqi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yusong Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China; Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
46
|
Chen XJ, Hua XY, Jiang ZR. ANMDA: anti-noise based computational model for predicting potential miRNA-disease associations. BMC Bioinformatics 2021; 22:358. [PMID: 34215183 PMCID: PMC8254275 DOI: 10.1186/s12859-021-04266-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 06/11/2021] [Indexed: 11/24/2022] Open
Abstract
Background A growing proportion of research has proved that microRNAs (miRNAs) can regulate the function of target genes and have close relations with various diseases. Developing computational methods to exploit more potential miRNA-disease associations can provide clues for further functional research. Results Inspired by the work of predecessors, we discover that the noise hiding in the data can affect the prediction performance and then propose an anti-noise algorithm (ANMDA) to predict potential miRNA-disease associations. Firstly, we calculate the similarity in miRNAs and diseases to construct features and obtain positive samples according to the Human MicroRNA Disease Database version 2.0 (HMDD v2.0). Then, we apply k-means on the undetected miRNA-disease associations and sample the negative examples equally from the k-cluster. Further, we construct several data subsets through sampling with replacement to feed on the light gradient boosting machine (LightGBM) method. Finally, the voting method is applied to predict potential miRNA-disease relationships. As a result, ANMDA can achieve an area under the receiver operating characteristic curve (AUROC) of 0.9373 ± 0.0005 in five-fold cross-validation, which is superior to several published methods. In addition, we analyze the predicted miRNA-disease associations with high probability and compare them with the data in HMDD v3.0 in the case study. The results show ANMDA is a novel and practical algorithm that can be used to infer potential miRNA-disease associations. Conclusion The results indicate the noise hiding in the data has an obvious impact on predicting potential miRNA-disease associations. We believe ANMDA can achieve better results from this task with more methods used in dealing with the data noise. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04266-6.
Collapse
Affiliation(s)
- Xue-Jun Chen
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
| | - Xin-Yun Hua
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
| | - Zhen-Ran Jiang
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
47
|
Li J, Peng D, Xie Y, Dai Z, Zou X, Li Z. Novel Potential Small Molecule-MiRNA-Cancer Associations Prediction Model Based on Fingerprint, Sequence, and Clinical Symptoms. J Chem Inf Model 2021; 61:2208-2219. [PMID: 33899462 DOI: 10.1021/acs.jcim.0c01458] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
As an important biomarker in organisms, miRNA is closely related to various small molecules and diseases. Research on small molecule-miRNA-cancer associations is helpful for the development of cancer treatment drugs and the discovery of pathogenesis. It is very urgent to develop theoretical methods for identifying potential small molecular-miRNA-cancer associations, because experimental approaches are usually time-consuming, laborious, and expensive. To overcome this problem, we developed a new computational method, in which features derived from structure, sequence, and symptoms were utilized to characterize small molecule, miRNA, and cancer, respectively. A feature vector was construct to characterize small molecule-miRNA-cancer association by concatenating these features, and a random forest algorithm was utilized to construct a model for recognizing potential association. Based on the 5-fold cross-validation and benchmark data set, the model achieved an accuracy of 93.20 ± 0.52%, a precision of 93.22 ± 0.51%, a recall of 93.20 ± 0.53%, and an F1-measure of 93.20 ± 0.52%. The areas under the receiver operating characteristic curve and precision recall curve were 0.9873 and 0.9870. The real prediction ability and application performance of the developed method have also been further evaluated and verified through an independent data set test and case study. Some potential small molecules and miRNAs related to cancer have been identified and are worthy of further experimental research. It is anticipated that our model could be regarded as a useful high-throughput virtual screening tool for drug research and development. All source codes can be downloaded from https://github.com/LeeKamlong/Multi-class-SMMCA.
Collapse
Affiliation(s)
- Jinlong Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
| | - Dongdong Peng
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
| | - Yun Xie
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
| | - Zong Dai
- School of Biomedical Engineering, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | - Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
- Key Laboratory of Digital Quality Evaluation of Chinese Materia Medica of State Administration of Traditional Chinese Medicine, Guangzhou 510006, People's Republic of China
| |
Collapse
|
48
|
Chu Y, Wang X, Dai Q, Wang Y, Wang Q, Peng S, Wei X, Qiu J, Salahub DR, Xiong Y, Wei DQ. MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph. Brief Bioinform 2021; 22:6261915. [PMID: 34009265 DOI: 10.1093/bib/bbab165] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/02/2021] [Accepted: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Accurate identification of the miRNA-disease associations (MDAs) helps to understand the etiology and mechanisms of various diseases. However, the experimental methods are costly and time-consuming. Thus, it is urgent to develop computational methods towards the prediction of MDAs. Based on the graph theory, the MDA prediction is regarded as a node classification task in the present study. To solve this task, we propose a novel method MDA-GCNFTG, which predicts MDAs based on Graph Convolutional Networks (GCNs) via graph sampling through the Feature and Topology Graph to improve the training efficiency and accuracy. This method models both the potential connections of feature space and the structural relationships of MDA data. The nodes of the graphs are represented by the disease semantic similarity, miRNA functional similarity and Gaussian interaction profile kernel similarity. Moreover, we considered six tasks simultaneously on the MDA prediction problem at the first time, which ensure that under both balanced and unbalanced sample distribution, MDA-GCNFTG can predict not only new MDAs but also new diseases without known related miRNAs and new miRNAs without known related diseases. The results of 5-fold cross-validation show that the MDA-GCNFTG method has achieved satisfactory performance on all six tasks and is significantly superior to the classic machine learning methods and the state-of-the-art MDA prediction methods. Moreover, the effectiveness of GCNs via the graph sampling strategy and the feature and topology graph in MDA-GCNFTG has also been demonstrated. More importantly, case studies for two diseases and three miRNAs are conducted and achieved satisfactory performance.
Collapse
Affiliation(s)
- Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Xuhong Wang
- School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, China
| | - Qiuying Dai
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Yanjing Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Qiankun Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, China
| | | | | | - Dennis Russell Salahub
- Department of Chemistry, University of Calgary, Fellow Royal Society of Canada and Fellow of the American Association for the Advancement of Science, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| |
Collapse
|
49
|
Liu D, Huang Y, Nie W, Zhang J, Deng L. SMALF: miRNA-disease associations prediction based on stacked autoencoder and XGBoost. BMC Bioinformatics 2021; 22:219. [PMID: 33910505 PMCID: PMC8082881 DOI: 10.1186/s12859-021-04135-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 04/14/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Identifying miRNA and disease associations helps us understand disease mechanisms of action from the molecular level. However, it is usually blind, time-consuming, and small-scale based on biological experiments. Hence, developing computational methods to predict unknown miRNA and disease associations is becoming increasingly important. RESULTS In this work, we develop a computational framework called SMALF to predict unknown miRNA-disease associations. SMALF first utilizes a stacked autoencoder to learn miRNA latent feature and disease latent feature from the original miRNA-disease association matrix. Then, SMALF obtains the feature vector of representing miRNA-disease by integrating miRNA functional similarity, miRNA latent feature, disease semantic similarity, and disease latent feature. Finally, XGBoost is utilized to predict unknown miRNA-disease associations. We implement cross-validation experiments. Compared with other state-of-the-art methods, SAMLF achieved the best AUC value. We also construct three case studies, including hepatocellular carcinoma, colon cancer, and breast cancer. The results show that 10, 10, and 9 out of the top ten predicted miRNAs are verified in MNDR v3.0 or miRCancer, respectively. CONCLUSION The comprehensive experimental results demonstrate that SMALF is effective in identifying unknown miRNA-disease associations.
Collapse
Affiliation(s)
- Dayun Liu
- School of Computer Science and Engineering, Central South University, Hunan, 410083, China
| | - Yibiao Huang
- School of Computer Science and Engineering, Central South University, Hunan, 410083, China
| | - Wenjuan Nie
- School of Computer Science and Engineering, Central South University, Hunan, 410083, China
| | - Jiaxuan Zhang
- Department of Cognitive Science, University of California San Diego, La Jolla, 92093, USA
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Hunan, 410083, China.
| |
Collapse
|
50
|
Tang M, Liu C, Liu D, Liu J, Liu J, Deng L. PMDFI: Predicting miRNA-Disease Associations Based on High-Order Feature Interaction. Front Genet 2021; 12:656107. [PMID: 33897768 PMCID: PMC8063614 DOI: 10.3389/fgene.2021.656107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 02/18/2021] [Indexed: 12/23/2022] Open
Abstract
MicroRNAs (miRNAs) are non-coding RNA molecules that make a significant contribution to diverse biological processes, and their mutations and dysregulations are closely related to the occurrence, development, and treatment of human diseases. Therefore, identification of potential miRNA–disease associations contributes to elucidating the pathogenesis of tumorigenesis and seeking the effective treatment method for diseases. Due to the expensive cost of traditional biological experiments of determining associations between miRNAs and diseases, increasing numbers of effective computational models are being used to compensate for this limitation. In this study, we propose a novel computational method, named PMDFI, which is an ensemble learning method to predict potential miRNA–disease associations based on high-order feature interactions. We initially use a stacked autoencoder to extract meaningful high-order features from the original similarity matrix, and then perform feature interactive learning, and finally utilize an integrated model composed of multiple random forests and logistic regression to make comprehensive predictions. The experimental results illustrate that PMDFI achieves excellent performance in predicting potential miRNA–disease associations, with the average area under the ROC curve scores of 0.9404 and 0.9415 in 5-fold and 10-fold cross-validation, respectively.
Collapse
Affiliation(s)
- Mingyan Tang
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Chenzhe Liu
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Dayun Liu
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Junyi Liu
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Jiaqi Liu
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|