1
|
Qiao L, Li C, Lin W, He X, Mi J, Tong Y, Gao J. ViroISDC: a method for calling integration sites of hepatitis B virus based on feature encoding. BMC Bioinformatics 2024; 25:177. [PMID: 38704528 PMCID: PMC11070082 DOI: 10.1186/s12859-024-05763-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 03/26/2024] [Indexed: 05/06/2024] Open
Abstract
BACKGROUND Hepatitis B virus (HBV) integrates into human chromosomes and can lead to genomic instability and hepatocarcinogenesis. Current tools for HBV integration site detection lack accuracy and stability. RESULTS This study proposes a deep learning-based method, named ViroISDC, for detecting integration sites. ViroISDC generates corresponding grammar rules and encodes the characteristics of the language data to predict integration sites accurately. Compared with Lumpy, Pindel, Seeksv, and SurVirus, ViroISDC exhibits better overall performance and is less sensitive to sequencing depth and integration sequence length, displaying good reliability, stability, and generality. Further downstream analysis of integrated sites detected by ViroISDC reveals the integration patterns and features of HBV. It is observed that HBV integration exhibits specific chromosomal preferences and tends to integrate into cancerous tissue. Moreover, HBV integration frequency was higher in males than females, and high-frequency integration sites were more likely to be present on hepatocarcinogenesis- and anti-cancer-related genes, validating the reliability of the ViroISDC. CONCLUSIONS ViroISDC pipeline exhibits superior precision, stability, and reliability across various datasets when compared to similar software. It is invaluable in exploring HBV infection in the human body, holding significant implications for the diagnosis, treatment, and prognosis assessment of HCC.
Collapse
Affiliation(s)
- Lei Qiao
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Chang Li
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Wei Lin
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Xiaoqi He
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Jia Mi
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Yigang Tong
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China.
| | - Jingyang Gao
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China.
| |
Collapse
|
2
|
Ma W, Bi X, Jiang H, Zhang S, Wei Z. CollaPPI: A Collaborative Learning Framework for Predicting Protein-Protein Interactions. IEEE J Biomed Health Inform 2024; 28:3167-3177. [PMID: 38466584 DOI: 10.1109/jbhi.2024.3375621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Exploring protein-protein interaction (PPI) is of paramount importance for elucidating the intrinsic mechanism of various biological processes. Nevertheless, experimental determination of PPI can be both time-consuming and expensive, motivating the exploration of data-driven deep learning technologies as a viable, efficient, and accurate alternative. Nonetheless, most current deep learning-based methods regarded a pair of proteins to be predicted for possible interaction as two separate entities when extracting PPI features, thus neglecting the knowledge sharing among the collaborative protein and the target protein. Aiming at the above issue, a collaborative learning framework CollaPPI was proposed in this study, where two kinds of collaboration, i.e., protein-level collaboration and task-level collaboration, were incorporated to achieve not only the knowledge-sharing between a pair of proteins, but also the complementation of such shared knowledge between biological domains closely related to PPI (i.e., protein function, and subcellular location). Evaluation results demonstrated that CollaPPI obtained superior performance compared to state-of-the-art methods on two PPI benchmarks. Besides, evaluation results of CollaPPI on the additional PPI type prediction task further proved its excellent generalization ability.
Collapse
|
3
|
Kang WY, Gao YL, Wang Y, Li F, Liu JX. KFDAE: CircRNA-Disease Associations Prediction Based on Kernel Fusion and Deep Auto-Encoder. IEEE J Biomed Health Inform 2024; 28:3178-3185. [PMID: 38408006 DOI: 10.1109/jbhi.2024.3369650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
CircRNA has been proved to play an important role in the diseases diagnosis and treatment. Considering that the wet-lab is time-consuming and expensive, computational methods are viable alternative in these years. However, the number of circRNA-disease associations (CDAs) that can be verified is relatively few, and some methods do not take full advantage of dependencies between attributes. To solve these problems, this paper proposes a novel method based on Kernel Fusion and Deep Auto-encoder (KFDAE) to predict the potential associations between circRNAs and diseases. Firstly, KFDAE uses a non-linear method to fuse the circRNA similarity kernels and disease similarity kernels. Then the vectors are connected to make the positive and negative sample sets, and these data are send to deep auto-encoder to reduce dimension and extract features. Finally, three-layer deep feedforward neural network is used to learn features and gain the prediction score. The experimental results show that compared with existing methods, KFDAE achieves the best performance. In addition, the results of case studies prove the effectiveness and practical significance of KFDAE, which means KFDAE is able to capture more comprehensive information and generate credible candidate for subsequent wet-lab.
Collapse
|
4
|
Daniel Thomas S, Vijayakumar K, John L, Krishnan D, Rehman N, Revikumar A, Kandel Codi JA, Prasad TSK, S S V, Raju R. Machine Learning Strategies in MicroRNA Research: Bridging Genome to Phenome. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024; 28:213-233. [PMID: 38752932 DOI: 10.1089/omi.2024.0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2024]
Abstract
MicroRNAs (miRNAs) have emerged as a prominent layer of regulation of gene expression. This article offers the salient and current aspects of machine learning (ML) tools and approaches from genome to phenome in miRNA research. First, we underline that the complexity in the analysis of miRNA function ranges from their modes of biogenesis to the target diversity in diverse biological conditions. Therefore, it is imperative to first ascertain the miRNA coding potential of genomes and understand the regulatory mechanisms of their expression. This knowledge enables the efficient classification of miRNA precursors and the identification of their mature forms and respective target genes. Second, and because one miRNA can target multiple mRNAs and vice versa, another challenge is the assessment of the miRNA-mRNA target interaction network. Furthermore, long-noncoding RNA (lncRNA)and circular RNAs (circRNAs) also contribute to this complexity. ML has been used to tackle these challenges at the high-dimensional data level. The present expert review covers more than 100 tools adopting various ML approaches pertaining to, for example, (1) miRNA promoter prediction, (2) precursor classification, (3) mature miRNA prediction, (4) miRNA target prediction, (5) miRNA- lncRNA and miRNA-circRNA interactions, (6) miRNA-mRNA expression profiling, (7) miRNA regulatory module detection, (8) miRNA-disease association, and (9) miRNA essentiality prediction. Taken together, we unpack, critically examine, and highlight the cutting-edge synergy of ML approaches and miRNA research so as to develop a dynamic and microlevel understanding of human health and diseases.
Collapse
Affiliation(s)
- Sonet Daniel Thomas
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Krithika Vijayakumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Levin John
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Deepak Krishnan
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Niyas Rehman
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Amjesh Revikumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Kerala Genome Data Centre, Kerala Development and Innovation Strategic Council, Thiruvananthapuram, Kerala, India
| | - Jalaluddin Akbar Kandel Codi
- Department of Surgical Oncology, Yenepoya Medical College, Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | | | - Vinodchandra S S
- Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India
| | - Rajesh Raju
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| |
Collapse
|
5
|
Ji B, Zou H, Xu L, Xie X, Peng S. MUSCLE: multi-view and multi-scale attentional feature fusion for microRNA-disease associations prediction. Brief Bioinform 2024; 25:bbae167. [PMID: 38605642 PMCID: PMC11009512 DOI: 10.1093/bib/bbae167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 03/02/2024] [Accepted: 03/31/2024] [Indexed: 04/13/2024] Open
Abstract
MicroRNAs (miRNAs) synergize with various biomolecules in human cells resulting in diverse functions in regulating a wide range of biological processes. Predicting potential disease-associated miRNAs as valuable biomarkers contributes to the treatment of human diseases. However, few previous methods take a holistic perspective and only concentrate on isolated miRNA and disease objects, thereby ignoring that human cells are responsible for multiple relationships. In this work, we first constructed a multi-view graph based on the relationships between miRNAs and various biomolecules, and then utilized graph attention neural network to learn the graph topology features of miRNAs and diseases for each view. Next, we added an attention mechanism again, and developed a multi-scale feature fusion module, aiming to determine the optimal fusion results for the multi-view topology features of miRNAs and diseases. In addition, the prior attribute knowledge of miRNAs and diseases was simultaneously added to achieve better prediction results and solve the cold start problem. Finally, the learned miRNA and disease representations were then concatenated and fed into a multi-layer perceptron for end-to-end training and predicting potential miRNA-disease associations. To assess the efficacy of our model (called MUSCLE), we performed 5- and 10-fold cross-validation (CV), which got average the Area under ROC curves of 0.966${\pm }$0.0102 and 0.973${\pm }$0.0135, respectively, outperforming most current state-of-the-art models. We then examined the impact of crucial parameters on prediction performance and performed ablation experiments on the feature combination and model architecture. Furthermore, the case studies about colon cancer, lung cancer and breast cancer also fully demonstrate the good inductive capability of MUSCLE. Our data and code are free available at a public GitHub repository: https://github.com/zht-code/MUSCLE.git.
Collapse
Affiliation(s)
- Boya Ji
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Haitao Zou
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
- College of Information Science and Engineering, Guilin University of Technology, Guilin 541006, China
| | - Liwen Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Xiaolan Xie
- College of Information Science and Engineering, Guilin University of Technology, Guilin 541006, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| |
Collapse
|
6
|
Zou H, Ji B, Zhang M, Liu F, Xie X, Peng S. MHGTMDA: Molecular heterogeneous graph transformer based on biological entity graph for miRNA-disease associations prediction. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102139. [PMID: 38384447 PMCID: PMC10879798 DOI: 10.1016/j.omtn.2024.102139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/31/2024] [Indexed: 02/23/2024]
Abstract
MicroRNAs (miRNAs) play a crucial role in the prevention, prognosis, diagnosis, and treatment of complex diseases. Existing computational methods primarily focus on biologically relevant molecules directly associated with miRNA or disease, overlooking the fact that the human body is a highly complex system where miRNA or disease may indirectly correlate with various types of biomolecules. To address this, we propose a novel prediction model named MHGTMDA (miRNA and disease association prediction using heterogeneous graph transformer based on molecular heterogeneous graph). MHGTMDA integrates biological entity relationships of eight biomolecules, constructing a relatively comprehensive heterogeneous biological entity graph. MHGTMDA serves as a powerful molecular heterogeneity map transformer, capturing structural elements and properties of miRNAs and diseases, revealing potential associations. In a 5-fold cross-validation study, MHGTMDA achieved an area under the receiver operating characteristic curve of 0.9569, surpassing state-of-the-art methods by at least 3%. Feature ablation experiments suggest that considering features among multiple biomolecules is more effective in uncovering miRNA-disease correlations. Furthermore, we conducted differential expression analyses on breast cancer and lung cancer, using MHGTMDA to further validate differentially expressed miRNAs. The results demonstrate MHGTMDA's capability to identify novel MDAs.
Collapse
Affiliation(s)
- Haitao Zou
- Guilin University of Technology, College of Information Science and Engineering, Guilin 541006, China
- Hunan University, College of Computer Science and Electronic Engineering, Changsha 410082, China
| | - Boya Ji
- Hunan University, College of Computer Science and Electronic Engineering, Changsha 410082, China
| | - Meng Zhang
- Xiangya Hospital, The Department of Thoracic Surgery, Changsha 410082, China
| | - Fen Liu
- Hunan Provincial People’s Hospital, Institute of Cardiovascular Epidemiology, Changsha 410082, China
| | - Xiaolan Xie
- Guilin University of Technology, College of Information Science and Engineering, Guilin 541006, China
| | - Shaoliang Peng
- Hunan University, College of Computer Science and Electronic Engineering, Changsha 410082, China
| |
Collapse
|
7
|
Zhang P, Zhang W, Sun W, Xu J, Hu H, Wang L, Wong L. Identification of gene biomarkers for brain diseases via multi-network topological semantics extraction and graph convolutional network. BMC Genomics 2024; 25:175. [PMID: 38350848 PMCID: PMC10865627 DOI: 10.1186/s12864-024-09967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/03/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND Brain diseases pose a significant threat to human health, and various network-based methods have been proposed for identifying gene biomarkers associated with these diseases. However, the brain is a complex system, and extracting topological semantics from different brain networks is necessary yet challenging to identify pathogenic genes for brain diseases. RESULTS In this study, we present a multi-network representation learning framework called M-GBBD for the identification of gene biomarker in brain diseases. Specifically, we collected multi-omics data to construct eleven networks from different perspectives. M-GBBD extracts the spatial distributions of features from these networks and iteratively optimizes them using Kullback-Leibler divergence to fuse the networks into a common semantic space that represents the gene network for the brain. Subsequently, a graph consisting of both gene and large-scale disease proximity networks learns representations through graph convolution techniques and predicts whether a gene is associated which brain diseases while providing associated scores. Experimental results demonstrate that M-GBBD outperforms several baseline methods. Furthermore, our analysis supported by bioinformatics revealed CAMP as a significantly associated gene with Alzheimer's disease identified by M-GBBD. CONCLUSION Collectively, M-GBBD provides valuable insights into identifying gene biomarkers for brain diseases and serves as a promising framework for brain networks representation learning.
Collapse
Affiliation(s)
- Ping Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weihan Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Hubei Hongshan Laboratory, Wuhan, 430074, China
| | - Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hua Hu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China.
| | - Leon Wong
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, China.
| |
Collapse
|
8
|
Li Y, Lou Y, Liu M, Chen S, Tan P, Li X, Sun H, Kong W, Zhang S, Shao X. Machine learning based biomarker discovery for chronic kidney disease-mineral and bone disorder (CKD-MBD). BMC Med Inform Decis Mak 2024; 24:36. [PMID: 38317140 PMCID: PMC10840173 DOI: 10.1186/s12911-024-02421-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 01/10/2024] [Indexed: 02/07/2024] Open
Abstract
INTRODUCTION Chronic kidney disease-mineral and bone disorder (CKD-MBD) is characterized by bone abnormalities, vascular calcification, and some other complications. Although there are diagnostic criteria for CKD-MBD, in situations when conducting target feature examining are unavailable, there is a need to investigate and discover alternative biochemical criteria that are easy to obtain. Moreover, studying the correlations between the newly discovered biomarkers and the existing ones may provide insights into the underlying molecular mechanisms of CKD-MBD. METHODS We collected a cohort of 116 individuals, consisting of three subtypes of CKD-MBD: calcium abnormality, phosphorus abnormality, and PTH abnormality. To identify the best biomarker panel for discrimination, we conducted six machine learning prediction methods and employed a sequential forward feature selection approach for each subtype. Additionally, we collected a separate prospective cohort of 114 samples to validate the discriminative power of the trained prediction models. RESULTS Using machine learning under cross validation setting, the feature selection method selected a concise biomarker panel for each CKD-MBD subtype as well as for the general one. Using the consensus of these features, best area under ROC curve reached up to 0.95 for the training dataset and 0.74 for the perspective dataset, respectively. DISCUSSION/CONCLUSION For the first time, we utilized machine learning methods to analyze biochemical criteria associated with CKD-MBD. Our aim was to identify alternative biomarkers that could serve not only as early detection indicators for CKD-MBD, but also as potential candidates for studying the underlying molecular mechanisms of the condition.
Collapse
Affiliation(s)
- Yuting Li
- Geriatrics Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Suzhou, China
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Yukuan Lou
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Man Liu
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Siyi Chen
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Peng Tan
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Xiang Li
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Huaixin Sun
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Weixin Kong
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Suhua Zhang
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Xiang Shao
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China.
| |
Collapse
|
9
|
Guo LX, Wang L, You ZH, Yu CQ, Hu ML, Zhao BW, Li Y. Biolinguistic graph fusion model for circRNA-miRNA association prediction. Brief Bioinform 2024; 25:bbae058. [PMID: 38426324 PMCID: PMC10939421 DOI: 10.1093/bib/bbae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 01/19/2024] [Accepted: 01/27/2024] [Indexed: 03/02/2024] Open
Abstract
Emerging clinical evidence suggests that sophisticated associations with circular ribonucleic acids (RNAs) (circRNAs) and microRNAs (miRNAs) are a critical regulatory factor of various pathological processes and play a critical role in most intricate human diseases. Nonetheless, the above correlations via wet experiments are error-prone and labor-intensive, and the underlying novel circRNA-miRNA association (CMA) has been validated by numerous existing computational methods that rely only on single correlation data. Considering the inadequacy of existing machine learning models, we propose a new model named BGF-CMAP, which combines the gradient boosting decision tree with natural language processing and graph embedding methods to infer associations between circRNAs and miRNAs. Specifically, BGF-CMAP extracts sequence attribute features and interaction behavior features by Word2vec and two homogeneous graph embedding algorithms, large-scale information network embedding and graph factorization, respectively. Multitudinous comprehensive experimental analysis revealed that BGF-CMAP successfully predicted the complex relationship between circRNAs and miRNAs with an accuracy of 82.90% and an area under receiver operating characteristic of 0.9075. Furthermore, 23 of the top 30 miRNA-associated circRNAs of the studies on data were confirmed in relevant experiences, showing that the BGF-CMAP model is superior to others. BGF-CMAP can serve as a helpful model to provide a scientific theoretical basis for the study of CMA prediction.
Collapse
Affiliation(s)
- Lu-Xiang Guo
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129, China
| | - Chang-Qing Yu
- College of Information Engineering, Xijing University, Xi’an 710123, China
| | - Meng-Lei Hu
- School of Medicine, Peking University, Beijing, 100091, China
| | - Bo-Wei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| |
Collapse
|
10
|
Guo LX, Wang L, You ZH, Yu CQ, Hu ML, Zhao BW, Li Y. Likelihood-based feature representation learning combined with neighborhood information for predicting circRNA-miRNA associations. Brief Bioinform 2024; 25:bbae020. [PMID: 38324624 PMCID: PMC10849193 DOI: 10.1093/bib/bbae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 01/01/2024] [Accepted: 01/11/2024] [Indexed: 02/09/2024] Open
Abstract
Connections between circular RNAs (circRNAs) and microRNAs (miRNAs) assume a pivotal position in the onset, evolution, diagnosis and treatment of diseases and tumors. Selecting the most potential circRNA-related miRNAs and taking advantage of them as the biological markers or drug targets could be conducive to dealing with complex human diseases through preventive strategies, diagnostic procedures and therapeutic approaches. Compared to traditional biological experiments, leveraging computational models to integrate diverse biological data in order to infer potential associations proves to be a more efficient and cost-effective approach. This paper developed a model of Convolutional Autoencoder for CircRNA-MiRNA Associations (CA-CMA) prediction. Initially, this model merged the natural language characteristics of the circRNA and miRNA sequence with the features of circRNA-miRNA interactions. Subsequently, it utilized all circRNA-miRNA pairs to construct a molecular association network, which was then fine-tuned by labeled samples to optimize the network parameters. Finally, the prediction outcome is obtained by utilizing the deep neural networks classifier. This model innovatively combines the likelihood objective that preserves the neighborhood through optimization, to learn the continuous feature representation of words and preserve the spatial information of two-dimensional signals. During the process of 5-fold cross-validation, CA-CMA exhibited exceptional performance compared to numerous prior computational approaches, as evidenced by its mean area under the receiver operating characteristic curve of 0.9138 and a minimal SD of 0.0024. Furthermore, recent literature has confirmed the accuracy of 25 out of the top 30 circRNA-miRNA pairs identified with the highest CA-CMA scores during case studies. The results of these experiments highlight the robustness and versatility of our model.
Collapse
Affiliation(s)
- Lu-Xiang Guo
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129, China
| | - Chang-Qing Yu
- College of Information Engineering, Xijing University, Xi’an 710123, China
| | - Meng-Lei Hu
- School of Medicine, Peking University, Beijing, 100091, China
| | - Bo-Wei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| |
Collapse
|
11
|
Zhang Y, Cai G, Li X, Chen M. GCN-Based Heterogeneous Complex Feature Learning to Enhance Predictability for LncRNA-Disease Associations. ACS OMEGA 2024; 9:1472-1484. [PMID: 38222651 PMCID: PMC10785310 DOI: 10.1021/acsomega.3c07923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 11/20/2023] [Accepted: 11/28/2023] [Indexed: 01/16/2024]
Abstract
Using computational models to predict potential lncRNA-disease associations (LDAs) has emerged as an effective supplement to bioexperiments for exploring the pathogenesis of diseases. However, current computational models still face limitations in their ability to learn the complex features of bionetworks. In this study, HGCNLDA, a model which combines graph convolutional network (GCN)-based aggregation, heterogeneous information fusion, and a bilinear-decoder to infer LDAs was proposed. Recognizing the need to extract essential features during data processing, our HGCNLDA explored four key steps for uncovering interaction patterns within the bionetwork: (1) a novel type of tripartite heterogeneous network, known as the lncRNA-disease-miRNA network (LDMN), was constructed using computed similarities and known associations. (2) Homogeneous and heterogeneous features of nodes were extracted from domains within the LDMN by a GCN-based encoder. (3) Feature fusions, including bipolymerization operations and attention mechanism, were employed to capture a more accurate and comprehensive representation of nodes. (4) Bilinear-decoder was used to rebuild the edge type (or rating type) for a specific node pair, resulting in the predicted association score. Through a 5-fold cross-validation on two data sets, namely, data set1 and data set2, our HGCNLDA consistently demonstrated superior performance compared to five related models. It almost achieved the highest AUROC and AUPR values on both data sets, especially on data set2 where the results obtained were more challenging and objective. Case studies involving three real cancer scenarios further validated the practicality of HGCNLDA in identifying potential LDAs in real-world contexts. The source code and data for this study are available at https://github.com/zywait/HGCNLDA.
Collapse
Affiliation(s)
- Yi Zhang
- Guilin
University of Technology, Guilin 541004, China
- Guangxi Key Laboratory of Embedded Technology
and Intelligent System, Guilin University
of Technology, Guilin 541004, China
| | - Gangsheng Cai
- Guilin
University of Technology, Guilin 541004, China
- Guangxi Key Laboratory of Embedded Technology
and Intelligent System, Guilin University
of Technology, Guilin 541004, China
| | - Xin Li
- Guilin
University of Technology, Guilin 541004, China
- Guangxi Key Laboratory of Embedded Technology
and Intelligent System, Guilin University
of Technology, Guilin 541004, China
| | - Min Chen
- School
of Computer Science and Technology, Hunan
Institute of Technology, Hengyang 421010, China
| |
Collapse
|
12
|
Bayatra A, Nasserat R, Ilan Y. Overcoming Low Adherence to Chronic Medications by Improving their Effectiveness using a Personalized Second-generation Digital System. Curr Pharm Biotechnol 2024; 25:2078-2088. [PMID: 38288794 DOI: 10.2174/0113892010269461240110060035] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 11/26/2023] [Accepted: 12/11/2023] [Indexed: 09/10/2024]
Abstract
INTRODUCTION Low adherence to chronic treatment regimens is a significant barrier to improving clinical outcomes in patients with chronic diseases. Low adherence is a result of multiple factors. METHODS We review the relevant studies on the prevalence of low adherence and present some potential solutions. RESULTS This review presents studies on the current measures taken to overcome low adherence, indicating a need for better methods to deal with this problem. The use of first-generation digital systems to improve adherence is mainly based on reminding patients to take their medications, which is one of the reasons they fail to provide a solution for many patients. The establishment of a second-generation artificial intelligence system, which aims to improve the effectiveness of chronic drugs, is described. CONCLUSION Improving clinically meaningful outcome measures and disease parameters may increase adherence and improve patients' response to therapy.
Collapse
Affiliation(s)
- Areej Bayatra
- Department of Medicine, the Hebrew University-Hadassah Medical Center, Jerusalem, Israel
| | - Rima Nasserat
- Department of Medicine, the Hebrew University-Hadassah Medical Center, Jerusalem, Israel
| | - Yaron Ilan
- Department of Medicine, the Hebrew University-Hadassah Medical Center, Jerusalem, Israel
| |
Collapse
|
13
|
Rocca R, Grillone K, Citriniti EL, Gualtieri G, Artese A, Tagliaferri P, Tassone P, Alcaro S. Targeting non-coding RNAs: Perspectives and challenges of in-silico approaches. Eur J Med Chem 2023; 261:115850. [PMID: 37839343 DOI: 10.1016/j.ejmech.2023.115850] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/08/2023] [Accepted: 09/29/2023] [Indexed: 10/17/2023]
Abstract
The growing information currently available on the central role of non-coding RNAs (ncRNAs) including microRNAs (miRNAS) and long non-coding RNAs (lncRNAs) for chronic and degenerative human diseases makes them attractive therapeutic targets. RNAs carry out different functional roles in human biology and are deeply deregulated in several diseases. So far, different attempts to therapeutically target the 3D RNA structures with small molecules have been reported. In this scenario, the development of computational tools suitable for describing RNA structures and their potential interactions with small molecules is gaining more and more interest. Here, we describe the most suitable strategies to study ncRNAs through computational tools. We focus on methods capable of predicting 2D and 3D ncRNA structures. Furthermore, we describe computational tools to identify, design and optimize small molecule ncRNA binders. This review aims to outline the state of the art and perspectives of computational methods for ncRNAs over the past decade.
Collapse
Affiliation(s)
- Roberta Rocca
- Department of Health Science, Magna Graecia University, Catanzaro, Italy; Net4Science srl, Academic Spinoff, Magna Græcia University, Catanzaro, Italy
| | - Katia Grillone
- Department of Experimental and Clinical Medicine, Magna Græcia University, Catanzaro, Italy
| | | | | | - Anna Artese
- Department of Health Science, Magna Graecia University, Catanzaro, Italy; Net4Science srl, Academic Spinoff, Magna Græcia University, Catanzaro, Italy.
| | | | - Pierfrancesco Tassone
- Department of Experimental and Clinical Medicine, Magna Græcia University, Catanzaro, Italy
| | - Stefano Alcaro
- Department of Health Science, Magna Graecia University, Catanzaro, Italy; Net4Science srl, Academic Spinoff, Magna Græcia University, Catanzaro, Italy
| |
Collapse
|
14
|
Wang S, Hui C, Zhang T, Wu P, Nakaguchi T, Xuan P. Graph Reasoning Method Based on Affinity Identification and Representation Decoupling for Predicting lncRNA-Disease Associations. J Chem Inf Model 2023; 63:6947-6958. [PMID: 37906529 DOI: 10.1021/acs.jcim.3c01214] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
An increasing number of studies have shown that dysregulation of lncRNAs is related to the occurrence of various diseases. Most of the previous methods, however, are designed based on homogeneity assumption that the representation of a target lncRNA (or disease) node should be updated by aggregating the attributes of its neighbor nodes. However, the assumption ignores the affinity nodes that are far from the target node. We present a novel prediction method, GAIRD, to fully leverage the heterogeneous information in the network and the decoupled node features. The first major innovation is a random walk strategy based on width-first searching and depth-first searching. Different from previous methods that only focus on homogeneous information, our new strategy learns both the homogeneous information within local neighborhoods and the heterogeneous information within higher-order neighborhoods. The second innovation is a representation decoupling module to extract the purer attributes and the purer topologies. Third, a module based on group convolution and deep separable convolution is developed to promote the pairwise intrachannel and interchannel feature learning. The experimental results show that GAIRD outperforms comparing state-of-the-art methods, and the ablation studies prove the contributions of major innovations. We also performed case studies on 3 diseases to further demonstrate the effectiveness of the GAIRD model in applications.
Collapse
Affiliation(s)
- Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Cui Hui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Peiliang Wu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
- Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| |
Collapse
|
15
|
Shan W, Chen L, Xu H, Zhong Q, Xu Y, Yao H, Lin K, Li X. GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47. Front Chem 2023; 11:1292869. [PMID: 37927570 PMCID: PMC10623438 DOI: 10.3389/fchem.2023.1292869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 10/09/2023] [Indexed: 11/07/2023] Open
Abstract
Identifying compound-protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning from large sample data. And the CPI for specific target often has a small amount of data available. To overcome the dilemma, we propose a virtual screening model, in which word2vec is used as an embedding tool to generate low-dimensional vectors of SMILES of compounds and amino acid sequences of proteins, and the modified multi-grained cascade forest based gcForest is used as the classifier. This proposed method is capable of constructing a model from raw data, adjusting model complexity according to the scale of datasets, especially for small scale datasets, and is robust with few hyper-parameters and without over-fitting. We found that the proposed model is superior to other CPI prediction models and performs well on the constructed challenging dataset. We finally predicted 2 new inhibitors for clusters of differentiation 47(CD47) which has few known inhibitors. The IC50s of enzyme activities of these 2 new small molecular inhibitors targeting CD47-SIRPα interaction are 3.57 and 4.79 μM respectively. These results fully demonstrate the competence of this concise but efficient tool for CPI prediction.
Collapse
Affiliation(s)
- Wenying Shan
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
- Faculty of Health Sciences, University of Macau, Macau, China
| | - Lvqi Chen
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Hao Xu
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry, Nanjing, China
- National Engineering Laboratory for Biomass Chemical Utilization, Nanjing, China
| | - Qinghao Zhong
- School of Humanities and Social Sciences, The Chinese University of Hong Kong, Shenzhen, China
| | - Yinqiu Xu
- Department of Pharmacy, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, China
| | - Hequan Yao
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Kejiang Lin
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Xuanyi Li
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
16
|
Wei MM, Yu CQ, Li LP, You ZH, Wang L. BCMCMI: A Fusion Model for Predicting circRNA-miRNA Interactions Combining Semantic and Meta-path. J Chem Inf Model 2023; 63:5384-5394. [PMID: 37535872 DOI: 10.1021/acs.jcim.3c00852] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
More and more evidence suggests that circRNA plays a vital role in generating and treating diseases by interacting with miRNA. Therefore, accurate prediction of potential circRNA-miRNA interaction (CMI) has become urgent. However, traditional wet experiments are time-consuming and costly, and the results will be affected by objective factors. In this paper, we propose a computational model BCMCMI, which combines three features to predict CMI. Specifically, BCMCMI utilizes the bidirectional encoding capability of the BERT algorithm to extract sequence features from the semantic information of circRNA and miRNA. Then, a heterogeneous network is constructed based on cosine similarity and known CMI information. The Metapath2vec is employed to conduct random walks following meta-paths in the network to capture topological features, including similarity features. Finally, potential CMIs are predicted using the XGBoost classifier. BCMCMI achieves superior results compared to other state-of-the-art models on two benchmark datasets for CMI prediction. We also utilize t-SNE to visually observe the distribution of the extracted features on a randomly selected dataset. The remarkable prediction results show that BCMCMI can serve as a valuable complement to the wet experiment process.
Collapse
Affiliation(s)
- Meng-Meng Wei
- School of Information Engineering, Xijing University, Xi'an, Shaanxi 710123, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, Shaanxi 710123, China
| | - Li-Ping Li
- College of Agriculture and Forestry, Longdong University, Qingyang, Gansu 745000, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
| | - Lei Wang
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, Guangxi 530007, China
| |
Collapse
|