1
|
Chu S, Duan G, Yan C. PGCNMDA: Learning node representations along paths with graph convolutional network for predicting miRNA-disease associations. Methods 2024; 229:71-81. [PMID: 38909974 DOI: 10.1016/j.ymeth.2024.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/26/2024] [Accepted: 06/16/2024] [Indexed: 06/25/2024] Open
Abstract
Identifying miRNA-disease associations (MDAs) is crucial for improving the diagnosis and treatment of various diseases. However, biological experiments can be time-consuming and expensive. To overcome these challenges, computational approaches have been developed, with Graph Convolutional Network (GCN) showing promising results in MDA prediction. The success of GCN-based methods relies on learning a meaningful spatial operator to extract effective node feature representations. To enhance the inference of MDAs, we propose a novel method called PGCNMDA, which employs graph convolutional networks with a learning graph spatial operator from paths. This approach enables the generation of meaningful spatial convolutions from paths in GCN, leading to improved prediction performance. On HMDD v2.0, PGCNMDA obtains a mean AUC of 0.9229 and an AUPRC of 0.9206 under 5-fold cross-validation (5-CV), and a mean AUC of 0.9235 and an AUPRC of 0.9212 under 10-fold cross-validation (10-CV), respectively. Additionally, the AUC of PGCNMDA also reaches 0.9238 under global leave-one-out cross-validation (GLOOCV). On HMDD v3.2, PGCNMDA obtains a mean AUC of 0.9413 and an AUPRC of 0.9417 under 5-CV, and a mean AUC of 0.9419 and an AUPRC of 0.9425 under 10-CV, respectively. Furthermore, the AUC of PGCNMDA also reaches 0.9415 under GLOOCV. The results show that PGCNMDA is superior to other compared methods. In addition, the case studies on pancreatic neoplasms, thyroid neoplasms and leukemia show that 50, 50 and 48 of the top 50 predicted miRNAs linked to these diseases are confirmed, respectively. It further validates the effectiveness and feasibility of PGCNMDA in practical applications.
Collapse
Affiliation(s)
- Shuang Chu
- School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China.
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Cheng Yan
- School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China.
| |
Collapse
|
2
|
Lu Y, Li Q, Li T. A novel hierarchical network-based approach to unveil the complexity of functional microbial genome. BMC Genomics 2024; 25:786. [PMID: 39138557 PMCID: PMC11323692 DOI: 10.1186/s12864-024-10692-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 08/07/2024] [Indexed: 08/15/2024] Open
Abstract
Biological networks serve a crucial role in elucidating intricate biological processes. While interspecies environmental interactions have been extensively studied, the exploration of gene interactions within species, particularly among individual microorganisms, is less developed. The increasing amount of microbiome genomic data necessitates a more nuanced analysis of microbial genome structures and functions. In this context, we introduce a complex structure using higher-order network theory, "Solid Motif Structures (SMS)", via a hierarchical biological network analysis of genomes within the same genus, effectively linking microbial genome structure with its function. Leveraging 162 high-quality genomes of Microcystis, a key freshwater cyanobacterium within microbial ecosystems, we established a genome structure network. Employing deep learning techniques, such as adaptive graph encoder, we uncovered 27 critical functional subnetworks and their associated SMSs. Incorporating metagenomic data from seven geographically distinct lakes, we conducted an investigation into Microcystis' functional stability under varying environmental conditions, unveiling unique functional interaction models for each lake. Our work compiles these insights into an extensive resource repository, providing novel perspectives on the functional dynamics within Microcystis. This research offers a hierarchical network analysis framework for understanding interactions between microbial genome structures and functions within the same genus.
Collapse
Affiliation(s)
- Yuntao Lu
- University of Michigan, Ann Arbor, USA
| | - Qi Li
- The State Key Laboratory of Freshwater Ecology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.
| | - Tao Li
- The State Key Laboratory of Freshwater Ecology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.
| |
Collapse
|
3
|
Qu J, Liu S, Li H, Zhou J, Bian Z, Song Z, Jiang Z. Three-layer heterogeneous network based on the integration of CircRNA information for MiRNA-disease association prediction. PeerJ Comput Sci 2024; 10:e2070. [PMID: 38983241 PMCID: PMC11232581 DOI: 10.7717/peerj-cs.2070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/29/2024] [Indexed: 07/11/2024]
Abstract
Increasing research has shown that the abnormal expression of microRNA (miRNA) is associated with many complex diseases. However, biological experiments have many limitations in identifying the potential disease-miRNA associations. Therefore, we developed a computational model of Three-Layer Heterogeneous Network based on the Integration of CircRNA information for MiRNA-Disease Association prediction (TLHNICMDA). In the model, a disease-miRNA-circRNA heterogeneous network is built by known disease-miRNA associations, known miRNA-circRNA interactions, disease similarity, miRNA similarity, and circRNA similarity. Then, the potential disease-miRNA associations are identified by an update algorithm based on the global network. Finally, based on global and local leave-one-out cross validation (LOOCV), the values of AUCs in TLHNICMDA are 0.8795 and 0.7774. Moreover, the mean and standard deviation of AUC in 5-fold cross-validations is 0.8777+/-0.0010. Especially, the two types of case studies illustrated the usefulness of TLHNICMDA in predicting disease-miRNA interactions.
Collapse
Affiliation(s)
- Jia Qu
- Changzhou University, School of Computer Science and Artificial Intelligence, Changzhou, Jiangsu, China
| | - Shuting Liu
- Changzhou University, School of Computer Science and Artificial Intelligence, Changzhou, Jiangsu, China
| | - Han Li
- Changzhou University, School of Computer Science and Artificial Intelligence, Changzhou, Jiangsu, China
| | - Jie Zhou
- Shaoxing University, School of Computer Science and Engineering, Shaoxing, Zhejiang, China
| | - Zekang Bian
- Jiangnan University, School of AI & Computer Science, Wuxi, Jiangsu, China
| | - Zihao Song
- Changzhou University, School of Computer Science and Artificial Intelligence, Changzhou, Jiangsu, China
| | - Zhibin Jiang
- Shaoxing University, School of Computer Science and Engineering, Shaoxing, Zhejiang, China
| |
Collapse
|
4
|
Zhang C, Gao Q, Li M, Yu T. Implementing link prediction in protein networks via feature fusion models based on graph neural networks. Comput Biol Chem 2024; 108:107980. [PMID: 38000328 DOI: 10.1016/j.compbiolchem.2023.107980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 10/07/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023]
Abstract
MOTIVATION Protein-protein interactions serve as the cornerstone for various biochemical processes within biological organisms. Existing research methodologies predominantly employ link prediction techniques to analyze these interaction networks. However, traditional approaches often fall short in delivering satisfactory predictive performance when applied to multi-species datasets. Current computational methods largely focus on analyzing the network topology, resulting in a somewhat monolithic feature set. The integration of diverse features in the model could potentially yield superior performance and broader applicability. To this end, we propose an autoencoder model built on graph neural networks, designed to enhance both predictive performance and generalizability by leveraging the integration of gene ontology. RESULTS In this research, we developed AGraphSAGE, a model specifically designed for analyzing protein-protein interaction network data. By seamlessly integrating gene ontology into the graph structure, we employed a dual-channel graph sampling and aggregation network that capitalizes on topological information to process high-dimensional features. Feature fusion is achieved through the implementation of graph attention mechanisms, and we adopted a link prediction framework as the experimental training model. Performance was evaluated on real-world datasets using key metrics, such as Area Under the Curve (AUC). A hyperparameter search space was established, and a Bayesian optimization strategy was applied to iteratively fine-tune the model, assessing the impact of various parameters on predictive efficacy. The experimental results validate that our proposed model is capable of effectively predicting protein-protein interactions across diverse biological species.
Collapse
Affiliation(s)
- Chi Zhang
- College of Computer and Control Engineering, Qiqihar University, Qiqihar 161006, China
| | - Qian Gao
- College of Computer and Control Engineering, Qiqihar University, Qiqihar 161006, China
| | - Ming Li
- College of Computer and Control Engineering, Qiqihar University, Qiqihar 161006, China.
| | - Tianfei Yu
- College of Life Science and Agriculture Forestry, Qiqihar University, Qiqihar 161006, China.
| |
Collapse
|
5
|
Li DX, Zhou P, Zhao BW, Su XR, Li GD, Zhang J, Hu PW, Hu L. Biocaiv: an integrative webserver for motif-based clustering analysis and interactive visualization of biological networks. BMC Bioinformatics 2023; 24:451. [PMID: 38030973 PMCID: PMC10685597 DOI: 10.1186/s12859-023-05574-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 11/20/2023] [Indexed: 12/01/2023] Open
Abstract
BACKGROUND As an important task in bioinformatics, clustering analysis plays a critical role in understanding the functional mechanisms of many complex biological systems, which can be modeled as biological networks. The purpose of clustering analysis in biological networks is to identify functional modules of interest, but there is a lack of online clustering tools that visualize biological networks and provide in-depth biological analysis for discovered clusters. RESULTS Here we present BioCAIV, a novel webserver dedicated to maximize its accessibility and applicability on the clustering analysis of biological networks. This, together with its user-friendly interface, assists biological researchers to perform an accurate clustering analysis for biological networks and identify functionally significant modules for further assessment. CONCLUSIONS BioCAIV is an efficient clustering analysis webserver designed for a variety of biological networks. BioCAIV is freely available without registration requirements at http://bioinformatics.tianshanzw.cn:8888/BioCAIV/ .
Collapse
Affiliation(s)
- Dong-Xu Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Peng Zhou
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Guo-Dong Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Jun Zhang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Peng-Wei Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Lun Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.
- University of Chinese Academy of Sciences, Beijing, China.
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China.
| |
Collapse
|
6
|
Peng W, He Z, Dai W, Lan W. MHCLMDA: multihypergraph contrastive learning for miRNA-disease association prediction. Brief Bioinform 2023; 25:bbad524. [PMID: 38243694 PMCID: PMC10796254 DOI: 10.1093/bib/bbad524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 12/13/2023] [Accepted: 12/18/2023] [Indexed: 01/21/2024] Open
Abstract
The correct prediction of disease-associated miRNAs plays an essential role in disease prevention and treatment. Current computational methods to predict disease-associated miRNAs construct different miRNA views and disease views based on various miRNA properties and disease properties and then integrate the multiviews to predict the relationship between miRNAs and diseases. However, most existing methods ignore the information interaction among the views and the consistency of miRNA features (disease features) across multiple views. This study proposes a computational method based on multiple hypergraph contrastive learning (MHCLMDA) to predict miRNA-disease associations. MHCLMDA first constructs multiple miRNA hypergraphs and disease hypergraphs based on various miRNA similarities and disease similarities and performs hypergraph convolution on each hypergraph to capture higher order interactions between nodes, followed by hypergraph contrastive learning to learn the consistent miRNA feature representation and disease feature representation under different views. Then, a variational auto-encoder is employed to extract the miRNA and disease features in known miRNA-disease association relationships. Finally, MHCLMDA fuses the miRNA and disease features from different views to predict miRNA-disease associations. The parameters of the model are optimized in an end-to-end way. We applied MHCLMDA to the prediction of human miRNA-disease association. The experimental results show that our method performs better than several other state-of-the-art methods in terms of the area under the receiver operating characteristic curve and the area under the precision-recall curve.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China and Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China
| | - Zhichen He
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China and Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China
| | - Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
| |
Collapse
|
7
|
Wu J, Chen H, Cheng M, Xiong H. CurvAGN: Curvature-based Adaptive Graph Neural Networks for Predicting Protein-Ligand Binding Affinity. BMC Bioinformatics 2023; 24:378. [PMID: 37798653 PMCID: PMC10557336 DOI: 10.1186/s12859-023-05503-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/28/2023] [Indexed: 10/07/2023] Open
Abstract
Accurately predicting the binding affinity between proteins and ligands is crucial for drug discovery. Recent advances in graph neural networks (GNNs) have made significant progress in learning representations of protein-ligand complexes to estimate binding affinities. To improve the performance of GNNs, there frequently needs to look into protein-ligand complexes from geometric perspectives. While the "off-the-shelf" GNNs could incorporate some basic geometric structures of molecules, such as distances and angles, through modeling the complexes as homophilic graphs, these solutions seldom take into account the higher-level geometric attributes like curvatures and homology, and also heterophilic interactions.To address these limitations, we introduce the Curvature-based Adaptive Graph Neural Network (CurvAGN). This GNN comprises two components: a curvature block and an adaptive attention guided neural block (AGN). The curvature block encodes multiscale curvature informaton, then the AGN, based on an adaptive graph attention mechanism, incorporates geometry structure including angle, distance, and multiscale curvature, long-range molecular interactions, and heterophily of the graph into the protein-ligand complex representation. We demonstrate the superiority of our proposed model through experiments conducted on the PDBbind-V2016 core dataset.
Collapse
Affiliation(s)
- Jianqiu Wu
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Hongyang Chen
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China.
| | - Minhao Cheng
- Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Jiulongwan, Hongkong, 999077, China
| | - Haoyi Xiong
- Big Data Lab, Baidu Inc., Haidian, Beijing, 100080, China
| |
Collapse
|
8
|
Zhang J, Liu B, Wu J, Wang Z, Li J. DeepCAC: a deep learning approach on DNA transcription factors classification based on multi-head self-attention and concatenate convolutional neural network. BMC Bioinformatics 2023; 24:345. [PMID: 37723425 PMCID: PMC10506269 DOI: 10.1186/s12859-023-05469-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 09/06/2023] [Indexed: 09/20/2023] Open
Abstract
Understanding gene expression processes necessitates the accurate classification and identification of transcription factors, which is supported by high-throughput sequencing technologies. However, these techniques suffer from inherent limitations such as time consumption and high costs. To address these challenges, the field of bioinformatics has increasingly turned to deep learning technologies for analyzing gene sequences. Nevertheless, the pursuit of improved experimental results has led to the inclusion of numerous complex analysis function modules, resulting in models with a growing number of parameters. To overcome these limitations, it is proposed a novel approach for analyzing DNA transcription factor sequences, which is named as DeepCAC. This method leverages deep convolutional neural networks with a multi-head self-attention mechanism. By employing convolutional neural networks, it can effectively capture local hidden features in the sequences. Simultaneously, the multi-head self-attention mechanism enhances the identification of hidden features with long-distant dependencies. This approach reduces the overall number of parameters in the model while harnessing the computational power of sequence data from multi-head self-attention. Through training with labeled data, experiments demonstrate that this approach significantly improves performance while requiring fewer parameters compared to existing methods. Additionally, the effectiveness of our approach is validated in accurately predicting DNA transcription factor sequences.
Collapse
Affiliation(s)
- Jidong Zhang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Bo Liu
- School of Mathematical and Computational Sciences, Massey University, Auckland, 0745, New Zealand.
| | - Jiahui Wu
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Zhihan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| |
Collapse
|