1
|
Guo C, Wang X, Ren H. Databases and computational methods for the identification of piRNA-related molecules: A survey. Comput Struct Biotechnol J 2024; 23:813-833. [PMID: 38328006 PMCID: PMC10847878 DOI: 10.1016/j.csbj.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/31/2023] [Accepted: 01/15/2024] [Indexed: 02/09/2024] Open
Abstract
Piwi-interacting RNAs (piRNAs) are a class of small non-coding RNAs (ncRNAs) that plays important roles in many biological processes and major cancer diagnosis and treatment, thus becoming a hot research topic. This study aims to provide an in-depth review of computational piRNA-related research, including databases and computational models. Herein, we perform literature analysis and use comparative evaluation methods to summarize and analyze three aspects of computational piRNA-related research: (i) computational models for piRNA-related molecular identification tasks, (ii) computational models for piRNA-disease association prediction tasks, and (iii) computational resources and evaluation metrics for these tasks. This study shows that computational piRNA-related research has significantly progressed, exhibiting promising performance in recent years, whereas they also suffer from the emerging challenges of inconsistent naming systems and the lack of data. Different from other reviews on piRNA-related identification tasks that focus on the organization of datasets and computational methods, we pay more attention to the analysis of computational models, algorithms, and performances that aim to provide valuable references for computational piRNA-related identification tasks. This study will benefit the theoretical development and practical application of piRNAs by better understanding computational models and resources to investigate the biological functions and clinical implications of piRNA.
Collapse
Affiliation(s)
- Chang Guo
- Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510420, China
| | - Xiaoli Wang
- Institute of Reproductive Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Han Ren
- Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510420, China
- Laboratory of Language and Artificial Intelligence, Guangdong University of Foreign Studies, Guangzhou 510420, China
| |
Collapse
|
2
|
Chu S, Duan G, Yan C. PGCNMDA: learning node representations along paths with graph convolutional network for predicting miRNA-disease associations. Methods 2024:S1046-2023(24)00157-9. [PMID: 38909974 DOI: 10.1016/j.ymeth.2024.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/26/2024] [Accepted: 06/16/2024] [Indexed: 06/25/2024] Open
Abstract
Identifying miRNA-disease associations (MDAs) is crucial for improving the diagnosis and treatment of various diseases. However, biological experiments can be time-consuming and expensive. To overcome these challenges, computational approaches have been developed, with Graph Convolutional Network (GCN) showing promising results in MDA prediction. The success of GCN-based methods relies on learning a meaningful spatial operator to extract effective node feature representations. To enhance the inference of MDAs, we propose a novel method called PGCNMDA, which employs graph convolutional networks with a learning graph spatial operator from paths. This approach enables the generation of meaningful spatial convolutions from paths in GCN, leading to improved prediction performance. On HMDD v2.0, PGCNMDA obtains a mean AUC of 0.9229 and an AUPRC of 0.9206 under 5-fold cross-validation (5-CV), and a mean AUC of 0.9235 and an AUPRC of 0.9212 under 10-fold cross-validation (10-CV), respectively. Additionally, the AUC of PGCNMDA also reaches 0.9238 under global leave-one-out cross-validation (GLOOCV). On HMDD v3.2, PGCNMDA obtains a mean AUC of 0.9413 and an AUPRC of 0.9417 under 5-CV, and a mean AUC of 0.9419 and an AUPRC of 0.9425 under 10-CV, respectively. Furthermore, the AUC of PGCNMDA also reaches 0.9415 under GLOOCV. The results show that PGCNMDA is superior to other compared methods. In addition, the case studies on pancreatic neoplasms, thyroid neoplasms and leukemia show that 50, 50 and 48 of the top 50 predicted miRNAs linked to these diseases are confirmed, respectively. It further validates the effectiveness and feasibility of PGCNMDA in practical applications.
Collapse
Affiliation(s)
- Shuang Chu
- School of Informatics, Hunan University of Chinese Medicine, Changsha 410208 China.
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Cheng Yan
- School of Informatics, Hunan University of Chinese Medicine, Changsha 410208 China.
| |
Collapse
|
3
|
Li YC, You ZH, Yu CQ, Wang L, Hu L, Hu PW, Qiao Y, Wang XF, Huang YA. DeepCMI: a graph-based model for accurate prediction of circRNA-miRNA interactions with multiple information. Brief Funct Genomics 2024; 23:276-285. [PMID: 37539561 DOI: 10.1093/bfgp/elad030] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 05/25/2023] [Accepted: 07/13/2023] [Indexed: 08/05/2023] Open
Abstract
Recently, the role of competing endogenous RNAs in regulating gene expression through the interaction of microRNAs has been closely associated with the expression of circular RNAs (circRNAs) in various biological processes such as reproduction and apoptosis. While the number of confirmed circRNA-miRNA interactions (CMIs) continues to increase, the conventional in vitro approaches for discovery are expensive, labor intensive, and time consuming. Therefore, there is an urgent need for effective prediction of potential CMIs through appropriate data modeling and prediction based on known information. In this study, we proposed a novel model, called DeepCMI, that utilizes multi-source information on circRNA/miRNA to predict potential CMIs. Comprehensive evaluations on the CMI-9905 and CMI-9589 datasets demonstrated that DeepCMI successfully infers potential CMIs. Specifically, DeepCMI achieved AUC values of 90.54% and 94.8% on the CMI-9905 and CMI-9589 datasets, respectively. These results suggest that DeepCMI is an effective model for predicting potential CMIs and has the potential to significantly reduce the need for downstream in vitro studies. To facilitate the use of our trained model and data, we have constructed a computational platform, which is available at http://120.77.11.78/DeepCMI/. The source code and datasets used in this work are available at https://github.com/LiYuechao1998/DeepCMI.
Collapse
Affiliation(s)
- Yue-Chao Li
- School of Information Engineering, Xijing University, Xi'an, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, China
| | - Lei Wang
- Guangxi Academy of Sciences, Nanning, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Peng-Wei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Yan Qiao
- College of Agriculture and Forestry, Longdong University, Qingyang 745000, China
| | - Xin-Fei Wang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
4
|
Li M, Zhou H, Yang H, Zhang R. RT: a Retrieving and Chain-of-Thought framework for few-shot medical named entity recognition. J Am Med Inform Assoc 2024:ocae095. [PMID: 38708849 DOI: 10.1093/jamia/ocae095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 04/10/2024] [Accepted: 04/15/2024] [Indexed: 05/07/2024] Open
Abstract
OBJECTIVES This article aims to enhance the performance of larger language models (LLMs) on the few-shot biomedical named entity recognition (NER) task by developing a simple and effective method called Retrieving and Chain-of-Thought (RT) framework and to evaluate the improvement after applying RT framework. MATERIALS AND METHODS Given the remarkable advancements in retrieval-based language model and Chain-of-Thought across various natural language processing tasks, we propose a pioneering RT framework designed to amalgamate both approaches. The RT approach encompasses dedicated modules for information retrieval and Chain-of-Thought processes. In the retrieval module, RT discerns pertinent examples from demonstrations during instructional tuning for each input sentence. Subsequently, the Chain-of-Thought module employs a systematic reasoning process to identify entities. We conducted a comprehensive comparative analysis of our RT framework against 16 other models for few-shot NER tasks on BC5CDR and NCBI corpora. Additionally, we explored the impacts of negative samples, output formats, and missing data on performance. RESULTS Our proposed RT framework outperforms other LMs for few-shot NER tasks with micro-F1 scores of 93.50 and 91.76 on BC5CDR and NCBI corpora, respectively. We found that using both positive and negative samples, Chain-of-Thought (vs Tree-of-Thought) performed better. Additionally, utilization of a partially annotated dataset has a marginal effect of the model performance. DISCUSSION This is the first investigation to combine a retrieval-based LLM and Chain-of-Thought methodology to enhance the performance in biomedical few-shot NER. The retrieval-based LLM aids in retrieving the most relevant examples of the input sentence, offering crucial knowledge to predict the entity in the sentence. We also conducted a meticulous examination of our methodology, incorporating an ablation study. CONCLUSION The RT framework with LLM has demonstrated state-of-the-art performance on few-shot NER tasks.
Collapse
Affiliation(s)
- Mingchen Li
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Huixue Zhou
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Han Yang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
5
|
Kang WY, Gao YL, Wang Y, Li F, Liu JX. KFDAE: CircRNA-Disease Associations Prediction Based on Kernel Fusion and Deep Auto-Encoder. IEEE J Biomed Health Inform 2024; 28:3178-3185. [PMID: 38408006 DOI: 10.1109/jbhi.2024.3369650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
CircRNA has been proved to play an important role in the diseases diagnosis and treatment. Considering that the wet-lab is time-consuming and expensive, computational methods are viable alternative in these years. However, the number of circRNA-disease associations (CDAs) that can be verified is relatively few, and some methods do not take full advantage of dependencies between attributes. To solve these problems, this paper proposes a novel method based on Kernel Fusion and Deep Auto-encoder (KFDAE) to predict the potential associations between circRNAs and diseases. Firstly, KFDAE uses a non-linear method to fuse the circRNA similarity kernels and disease similarity kernels. Then the vectors are connected to make the positive and negative sample sets, and these data are send to deep auto-encoder to reduce dimension and extract features. Finally, three-layer deep feedforward neural network is used to learn features and gain the prediction score. The experimental results show that compared with existing methods, KFDAE achieves the best performance. In addition, the results of case studies prove the effectiveness and practical significance of KFDAE, which means KFDAE is able to capture more comprehensive information and generate credible candidate for subsequent wet-lab.
Collapse
|
6
|
Chen Q, Zhang L, Liu Y, Qin Z, Zhao T. PUTransGCN: identification of piRNA-disease associations based on attention encoding graph convolutional network and positive unlabelled learning. Brief Bioinform 2024; 25:bbae144. [PMID: 38581419 PMCID: PMC10998538 DOI: 10.1093/bib/bbae144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 02/25/2024] [Accepted: 03/15/2024] [Indexed: 04/08/2024] Open
Abstract
Piwi-interacting RNAs (piRNAs) play a crucial role in various biological processes and are implicated in disease. Consequently, there is an escalating demand for computational tools to predict piRNA-disease interactions. Although there have been computational methods proposed for the detection of piRNA-disease associations, the problem of imbalanced and sparse dataset has brought great challenges to capture the complex relationships between piRNAs and diseases. In response to this necessity, we have developed a novel computational architecture, denoted as PUTransGCN, which uses heterogeneous graph convolutional networks to uncover potential piRNA-disease associations. Additionally, the attention mechanism was used to adjust the weight parameters of aggregation heterogeneous node features automatically. For tackling the imbalanced dataset problem, the combined positive unlabelled learning (PUL) method comprising PU bagging, two-step and spy technique was applied to select reliable negative associations. The features of piRNAs and diseases were derived from three distinct biological sources by PUTransGCN, including information on piRNA sequences, semantic terms related to diseases and the existing network of piRNA-disease associations. In the experiment, PUTransGCN performs in 5-fold cross-validation with an AUC of 0.93 and 0.95 on two datasets, respectively, which outperforms the other six state-of-the-art models. We compared three different PUL methods, and the results of the ablation experiment indicate that the combined PUL method yields the best results. The PUTransGCN could serve as a valuable piRNA-disease prediction tool for upcoming studies in the biomedical field. The code for PUTransGCN is available at https://github.com/chenqiuhao/PUTransGCN.
Collapse
Affiliation(s)
- Qiuhao Chen
- Institute of Bioinformatics, Harbin Institute of Technology, 150000, Harbin, Heilongjiang, China
| | - Liyuan Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, 150000, Harbin, Heilongjiang, China
| | - Yaojia Liu
- School of Computer Science and Technology, Harbin Institute of Technology, 150000, Harbin, Heilongjiang, China
| | - Zhonghao Qin
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, 150000, Harbin, Heilongjiang, China
| | - Tianyi Zhao
- School of Computer Science and Technology, Harbin Institute of Technology, 150000, Harbin, Heilongjiang, China
| |
Collapse
|
7
|
Hu X, Zhang P, Liu D, Zhang J, Zhang Y, Dong Y, Fan Y, Deng L. IGCNSDA: unraveling disease-associated snoRNAs with an interpretable graph convolutional network. Brief Bioinform 2024; 25:bbae179. [PMID: 38647155 PMCID: PMC11033953 DOI: 10.1093/bib/bbae179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/15/2023] [Accepted: 03/27/2024] [Indexed: 04/25/2024] Open
Abstract
Accurately delineating the connection between short nucleolar RNA (snoRNA) and disease is crucial for advancing disease detection and treatment. While traditional biological experimental methods are effective, they are labor-intensive, costly and lack scalability. With the ongoing progress in computer technology, an increasing number of deep learning techniques are being employed to predict snoRNA-disease associations. Nevertheless, the majority of these methods are black-box models, lacking interpretability and the capability to elucidate the snoRNA-disease association mechanism. In this study, we introduce IGCNSDA, an innovative and interpretable graph convolutional network (GCN) approach tailored for the efficient inference of snoRNA-disease associations. IGCNSDA leverages the GCN framework to extract node feature representations of snoRNAs and diseases from the bipartite snoRNA-disease graph. SnoRNAs with high similarity are more likely to be linked to analogous diseases, and vice versa. To facilitate this process, we introduce a subgraph generation algorithm that effectively groups similar snoRNAs and their associated diseases into cohesive subgraphs. Subsequently, we aggregate information from neighboring nodes within these subgraphs, iteratively updating the embeddings of snoRNAs and diseases. The experimental results demonstrate that IGCNSDA outperforms the most recent, highly relevant methods. Additionally, our interpretability analysis provides compelling evidence that IGCNSDA adeptly captures the underlying similarity between snoRNAs and diseases, thus affording researchers enhanced insights into the snoRNA-disease association mechanism. Furthermore, we present illustrative case studies that demonstrate the utility of IGCNSDA as a valuable tool for efficiently predicting potential snoRNA-disease associations. The dataset and source code for IGCNSDA are openly accessible at: https://github.com/altriavin/IGCNSDA.
Collapse
Affiliation(s)
- Xiaowen Hu
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Pan Zhang
- Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, 410078, ChangshaChina
| | - Dayun Liu
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Jiaxuan Zhang
- Department of Electrical and Computer Engineering, University of California, San Diego, 92093, CA, United States
| | - Yuanpeng Zhang
- School of Software, Xinjiang University, 830046, Urumqi, China
| | - Yihan Dong
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Yanhao Fan
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| |
Collapse
|
8
|
Zou H, Ji B, Zhang M, Liu F, Xie X, Peng S. MHGTMDA: Molecular heterogeneous graph transformer based on biological entity graph for miRNA-disease associations prediction. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102139. [PMID: 38384447 PMCID: PMC10879798 DOI: 10.1016/j.omtn.2024.102139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/31/2024] [Indexed: 02/23/2024]
Abstract
MicroRNAs (miRNAs) play a crucial role in the prevention, prognosis, diagnosis, and treatment of complex diseases. Existing computational methods primarily focus on biologically relevant molecules directly associated with miRNA or disease, overlooking the fact that the human body is a highly complex system where miRNA or disease may indirectly correlate with various types of biomolecules. To address this, we propose a novel prediction model named MHGTMDA (miRNA and disease association prediction using heterogeneous graph transformer based on molecular heterogeneous graph). MHGTMDA integrates biological entity relationships of eight biomolecules, constructing a relatively comprehensive heterogeneous biological entity graph. MHGTMDA serves as a powerful molecular heterogeneity map transformer, capturing structural elements and properties of miRNAs and diseases, revealing potential associations. In a 5-fold cross-validation study, MHGTMDA achieved an area under the receiver operating characteristic curve of 0.9569, surpassing state-of-the-art methods by at least 3%. Feature ablation experiments suggest that considering features among multiple biomolecules is more effective in uncovering miRNA-disease correlations. Furthermore, we conducted differential expression analyses on breast cancer and lung cancer, using MHGTMDA to further validate differentially expressed miRNAs. The results demonstrate MHGTMDA's capability to identify novel MDAs.
Collapse
Affiliation(s)
- Haitao Zou
- Guilin University of Technology, College of Information Science and Engineering, Guilin 541006, China
- Hunan University, College of Computer Science and Electronic Engineering, Changsha 410082, China
| | - Boya Ji
- Hunan University, College of Computer Science and Electronic Engineering, Changsha 410082, China
| | - Meng Zhang
- Xiangya Hospital, The Department of Thoracic Surgery, Changsha 410082, China
| | - Fen Liu
- Hunan Provincial People’s Hospital, Institute of Cardiovascular Epidemiology, Changsha 410082, China
| | - Xiaolan Xie
- Guilin University of Technology, College of Information Science and Engineering, Guilin 541006, China
| | - Shaoliang Peng
- Hunan University, College of Computer Science and Electronic Engineering, Changsha 410082, China
| |
Collapse
|
9
|
Zhang P, Zhang W, Sun W, Xu J, Hu H, Wang L, Wong L. Identification of gene biomarkers for brain diseases via multi-network topological semantics extraction and graph convolutional network. BMC Genomics 2024; 25:175. [PMID: 38350848 PMCID: PMC10865627 DOI: 10.1186/s12864-024-09967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/03/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND Brain diseases pose a significant threat to human health, and various network-based methods have been proposed for identifying gene biomarkers associated with these diseases. However, the brain is a complex system, and extracting topological semantics from different brain networks is necessary yet challenging to identify pathogenic genes for brain diseases. RESULTS In this study, we present a multi-network representation learning framework called M-GBBD for the identification of gene biomarker in brain diseases. Specifically, we collected multi-omics data to construct eleven networks from different perspectives. M-GBBD extracts the spatial distributions of features from these networks and iteratively optimizes them using Kullback-Leibler divergence to fuse the networks into a common semantic space that represents the gene network for the brain. Subsequently, a graph consisting of both gene and large-scale disease proximity networks learns representations through graph convolution techniques and predicts whether a gene is associated which brain diseases while providing associated scores. Experimental results demonstrate that M-GBBD outperforms several baseline methods. Furthermore, our analysis supported by bioinformatics revealed CAMP as a significantly associated gene with Alzheimer's disease identified by M-GBBD. CONCLUSION Collectively, M-GBBD provides valuable insights into identifying gene biomarkers for brain diseases and serves as a promising framework for brain networks representation learning.
Collapse
Affiliation(s)
- Ping Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weihan Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Hubei Hongshan Laboratory, Wuhan, 430074, China
| | - Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hua Hu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China.
| | - Leon Wong
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, China.
| |
Collapse
|
10
|
Li Y, Lou Y, Liu M, Chen S, Tan P, Li X, Sun H, Kong W, Zhang S, Shao X. Machine learning based biomarker discovery for chronic kidney disease-mineral and bone disorder (CKD-MBD). BMC Med Inform Decis Mak 2024; 24:36. [PMID: 38317140 PMCID: PMC10840173 DOI: 10.1186/s12911-024-02421-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 01/10/2024] [Indexed: 02/07/2024] Open
Abstract
INTRODUCTION Chronic kidney disease-mineral and bone disorder (CKD-MBD) is characterized by bone abnormalities, vascular calcification, and some other complications. Although there are diagnostic criteria for CKD-MBD, in situations when conducting target feature examining are unavailable, there is a need to investigate and discover alternative biochemical criteria that are easy to obtain. Moreover, studying the correlations between the newly discovered biomarkers and the existing ones may provide insights into the underlying molecular mechanisms of CKD-MBD. METHODS We collected a cohort of 116 individuals, consisting of three subtypes of CKD-MBD: calcium abnormality, phosphorus abnormality, and PTH abnormality. To identify the best biomarker panel for discrimination, we conducted six machine learning prediction methods and employed a sequential forward feature selection approach for each subtype. Additionally, we collected a separate prospective cohort of 114 samples to validate the discriminative power of the trained prediction models. RESULTS Using machine learning under cross validation setting, the feature selection method selected a concise biomarker panel for each CKD-MBD subtype as well as for the general one. Using the consensus of these features, best area under ROC curve reached up to 0.95 for the training dataset and 0.74 for the perspective dataset, respectively. DISCUSSION/CONCLUSION For the first time, we utilized machine learning methods to analyze biochemical criteria associated with CKD-MBD. Our aim was to identify alternative biomarkers that could serve not only as early detection indicators for CKD-MBD, but also as potential candidates for studying the underlying molecular mechanisms of the condition.
Collapse
Affiliation(s)
- Yuting Li
- Geriatrics Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Suzhou, China
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Yukuan Lou
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Man Liu
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Siyi Chen
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Peng Tan
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Xiang Li
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Huaixin Sun
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Weixin Kong
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Suhua Zhang
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China
| | - Xiang Shao
- Hemodialysis Department, Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Wan Shen St. 118, Suzhou, Jiangsu, 215028, China.
| |
Collapse
|
11
|
Guo LX, Wang L, You ZH, Yu CQ, Hu ML, Zhao BW, Li Y. Biolinguistic graph fusion model for circRNA-miRNA association prediction. Brief Bioinform 2024; 25:bbae058. [PMID: 38426324 PMCID: PMC10939421 DOI: 10.1093/bib/bbae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 01/19/2024] [Accepted: 01/27/2024] [Indexed: 03/02/2024] Open
Abstract
Emerging clinical evidence suggests that sophisticated associations with circular ribonucleic acids (RNAs) (circRNAs) and microRNAs (miRNAs) are a critical regulatory factor of various pathological processes and play a critical role in most intricate human diseases. Nonetheless, the above correlations via wet experiments are error-prone and labor-intensive, and the underlying novel circRNA-miRNA association (CMA) has been validated by numerous existing computational methods that rely only on single correlation data. Considering the inadequacy of existing machine learning models, we propose a new model named BGF-CMAP, which combines the gradient boosting decision tree with natural language processing and graph embedding methods to infer associations between circRNAs and miRNAs. Specifically, BGF-CMAP extracts sequence attribute features and interaction behavior features by Word2vec and two homogeneous graph embedding algorithms, large-scale information network embedding and graph factorization, respectively. Multitudinous comprehensive experimental analysis revealed that BGF-CMAP successfully predicted the complex relationship between circRNAs and miRNAs with an accuracy of 82.90% and an area under receiver operating characteristic of 0.9075. Furthermore, 23 of the top 30 miRNA-associated circRNAs of the studies on data were confirmed in relevant experiences, showing that the BGF-CMAP model is superior to others. BGF-CMAP can serve as a helpful model to provide a scientific theoretical basis for the study of CMA prediction.
Collapse
Affiliation(s)
- Lu-Xiang Guo
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129, China
| | - Chang-Qing Yu
- College of Information Engineering, Xijing University, Xi’an 710123, China
| | - Meng-Lei Hu
- School of Medicine, Peking University, Beijing, 100091, China
| | - Bo-Wei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| |
Collapse
|
12
|
Guo LX, Wang L, You ZH, Yu CQ, Hu ML, Zhao BW, Li Y. Likelihood-based feature representation learning combined with neighborhood information for predicting circRNA-miRNA associations. Brief Bioinform 2024; 25:bbae020. [PMID: 38324624 PMCID: PMC10849193 DOI: 10.1093/bib/bbae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 01/01/2024] [Accepted: 01/11/2024] [Indexed: 02/09/2024] Open
Abstract
Connections between circular RNAs (circRNAs) and microRNAs (miRNAs) assume a pivotal position in the onset, evolution, diagnosis and treatment of diseases and tumors. Selecting the most potential circRNA-related miRNAs and taking advantage of them as the biological markers or drug targets could be conducive to dealing with complex human diseases through preventive strategies, diagnostic procedures and therapeutic approaches. Compared to traditional biological experiments, leveraging computational models to integrate diverse biological data in order to infer potential associations proves to be a more efficient and cost-effective approach. This paper developed a model of Convolutional Autoencoder for CircRNA-MiRNA Associations (CA-CMA) prediction. Initially, this model merged the natural language characteristics of the circRNA and miRNA sequence with the features of circRNA-miRNA interactions. Subsequently, it utilized all circRNA-miRNA pairs to construct a molecular association network, which was then fine-tuned by labeled samples to optimize the network parameters. Finally, the prediction outcome is obtained by utilizing the deep neural networks classifier. This model innovatively combines the likelihood objective that preserves the neighborhood through optimization, to learn the continuous feature representation of words and preserve the spatial information of two-dimensional signals. During the process of 5-fold cross-validation, CA-CMA exhibited exceptional performance compared to numerous prior computational approaches, as evidenced by its mean area under the receiver operating characteristic curve of 0.9138 and a minimal SD of 0.0024. Furthermore, recent literature has confirmed the accuracy of 25 out of the top 30 circRNA-miRNA pairs identified with the highest CA-CMA scores during case studies. The results of these experiments highlight the robustness and versatility of our model.
Collapse
Affiliation(s)
- Lu-Xiang Guo
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129, China
| | - Chang-Qing Yu
- College of Information Engineering, Xijing University, Xi’an 710123, China
| | - Meng-Lei Hu
- School of Medicine, Peking University, Beijing, 100091, China
| | - Bo-Wei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| |
Collapse
|
13
|
Zhang Y, Cai G, Li X, Chen M. GCN-Based Heterogeneous Complex Feature Learning to Enhance Predictability for LncRNA-Disease Associations. ACS OMEGA 2024; 9:1472-1484. [PMID: 38222651 PMCID: PMC10785310 DOI: 10.1021/acsomega.3c07923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 11/20/2023] [Accepted: 11/28/2023] [Indexed: 01/16/2024]
Abstract
Using computational models to predict potential lncRNA-disease associations (LDAs) has emerged as an effective supplement to bioexperiments for exploring the pathogenesis of diseases. However, current computational models still face limitations in their ability to learn the complex features of bionetworks. In this study, HGCNLDA, a model which combines graph convolutional network (GCN)-based aggregation, heterogeneous information fusion, and a bilinear-decoder to infer LDAs was proposed. Recognizing the need to extract essential features during data processing, our HGCNLDA explored four key steps for uncovering interaction patterns within the bionetwork: (1) a novel type of tripartite heterogeneous network, known as the lncRNA-disease-miRNA network (LDMN), was constructed using computed similarities and known associations. (2) Homogeneous and heterogeneous features of nodes were extracted from domains within the LDMN by a GCN-based encoder. (3) Feature fusions, including bipolymerization operations and attention mechanism, were employed to capture a more accurate and comprehensive representation of nodes. (4) Bilinear-decoder was used to rebuild the edge type (or rating type) for a specific node pair, resulting in the predicted association score. Through a 5-fold cross-validation on two data sets, namely, data set1 and data set2, our HGCNLDA consistently demonstrated superior performance compared to five related models. It almost achieved the highest AUROC and AUPR values on both data sets, especially on data set2 where the results obtained were more challenging and objective. Case studies involving three real cancer scenarios further validated the practicality of HGCNLDA in identifying potential LDAs in real-world contexts. The source code and data for this study are available at https://github.com/zywait/HGCNLDA.
Collapse
Affiliation(s)
- Yi Zhang
- Guilin
University of Technology, Guilin 541004, China
- Guangxi Key Laboratory of Embedded Technology
and Intelligent System, Guilin University
of Technology, Guilin 541004, China
| | - Gangsheng Cai
- Guilin
University of Technology, Guilin 541004, China
- Guangxi Key Laboratory of Embedded Technology
and Intelligent System, Guilin University
of Technology, Guilin 541004, China
| | - Xin Li
- Guilin
University of Technology, Guilin 541004, China
- Guangxi Key Laboratory of Embedded Technology
and Intelligent System, Guilin University
of Technology, Guilin 541004, China
| | - Min Chen
- School
of Computer Science and Technology, Hunan
Institute of Technology, Hengyang 421010, China
| |
Collapse
|
14
|
Yu S, Wang H, Li J, Zhao J, Liang C, Sun Y. A Multi-Relational Graph Encoder Network for Fine-Grained Prediction of MiRNA-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:45-56. [PMID: 38015672 DOI: 10.1109/tcbb.2023.3335007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
MicroRNAs (miRNAs) are critical in diagnosing and treating various diseases. Automatically demystifying the interdependent relationships between miRNAs and diseases has recently made remarkable progress, but their fine-grained interactive relationships still need to be explored. We propose a multi-relational graph encoder network for fine-grained prediction of miRNA-disease associations (MRFGMDA), which uses practical and current datasets to construct a multi-relational graph encoder network to predict disease-related miRNAs and their specific relationship types (upregulation, downregulation, or dysregulation). We evaluated MRFGMDA and found that it accurately predicted miRNA-disease associations, which could have far-reaching implications for clinical medical analysis, early diagnosis, prevention, and treatment. Case analyses, Kaplan-Meier survival analysis, expression difference analysis, and immune infiltration analysis further demonstrated the effectiveness and feasibility of MRFGMDA in uncovering potential disease-related miRNAs. Overall, our work represents a significant step toward improving the prediction of miRNA-disease associations using a fine-grained approach could lead to more accurate diagnosis and treatment of diseases.
Collapse
|
15
|
Wang S, Hui C, Zhang T, Wu P, Nakaguchi T, Xuan P. Graph Reasoning Method Based on Affinity Identification and Representation Decoupling for Predicting lncRNA-Disease Associations. J Chem Inf Model 2023; 63:6947-6958. [PMID: 37906529 DOI: 10.1021/acs.jcim.3c01214] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
An increasing number of studies have shown that dysregulation of lncRNAs is related to the occurrence of various diseases. Most of the previous methods, however, are designed based on homogeneity assumption that the representation of a target lncRNA (or disease) node should be updated by aggregating the attributes of its neighbor nodes. However, the assumption ignores the affinity nodes that are far from the target node. We present a novel prediction method, GAIRD, to fully leverage the heterogeneous information in the network and the decoupled node features. The first major innovation is a random walk strategy based on width-first searching and depth-first searching. Different from previous methods that only focus on homogeneous information, our new strategy learns both the homogeneous information within local neighborhoods and the heterogeneous information within higher-order neighborhoods. The second innovation is a representation decoupling module to extract the purer attributes and the purer topologies. Third, a module based on group convolution and deep separable convolution is developed to promote the pairwise intrachannel and interchannel feature learning. The experimental results show that GAIRD outperforms comparing state-of-the-art methods, and the ablation studies prove the contributions of major innovations. We also performed case studies on 3 diseases to further demonstrate the effectiveness of the GAIRD model in applications.
Collapse
Affiliation(s)
- Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Cui Hui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Peiliang Wu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
- Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| |
Collapse
|
16
|
Guo Q, Liao Y, Li Z, Lin H, Liang S. Convolutional Models with Multi-Feature Fusion for Effective Link Prediction in Knowledge Graph Embedding. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1472. [PMID: 37895593 PMCID: PMC10606879 DOI: 10.3390/e25101472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/19/2023] [Accepted: 10/20/2023] [Indexed: 10/29/2023]
Abstract
Link prediction remains paramount in knowledge graph embedding (KGE), aiming to discern obscured or non-manifest relationships within a given knowledge graph (KG). Despite the critical nature of this endeavor, contemporary methodologies grapple with notable constraints, predominantly in terms of computational overhead and the intricacy of encapsulating multifaceted relationships. This paper introduces a sophisticated approach that amalgamates convolutional operators with pertinent graph structural information. By meticulously integrating information pertinent to entities and their immediate relational neighbors, we enhance the performance of the convolutional model, culminating in an averaged embedding ensuing from the convolution across entities and their proximal nodes. Significantly, our methodology presents a distinctive avenue, facilitating the inclusion of edge-specific data into the convolutional model's input, thus endowing users with the latitude to calibrate the model's architecture and parameters congruent with their specific dataset. Empirical evaluations underscore the ascendancy of our proposition over extant convolution-based link prediction benchmarks, particularly evident across the FB15k, WN18, and YAGO3-10 datasets. The primary objective of this research lies in forging KGE link prediction methodologies imbued with heightened efficiency and adeptness, thereby addressing salient challenges inherent to real-world applications.
Collapse
Affiliation(s)
- Qinglang Guo
- School of Cyber Science and Technology, University of Science and Technology of China, Heifei 230027, China
- National Engineering Research Center for Public Safety Risk Perception and Control by Big Data (RPP), CETC Academy of Electronics and Information Technology Group Co., Ltd., China Academic of Electronics and Information Technology, Beijing 100041, China;
| | - Yong Liao
- School of Cyber Science and Technology, University of Science and Technology of China, Heifei 230027, China
| | - Zhe Li
- Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Hui Lin
- National Engineering Research Center for Public Safety Risk Perception and Control by Big Data (RPP), CETC Academy of Electronics and Information Technology Group Co., Ltd., China Academic of Electronics and Information Technology, Beijing 100041, China;
| | - Shenglin Liang
- School of Telecommunications Engineering, Xidian University, Xi’an 710071, China
| |
Collapse
|
17
|
Shan W, Chen L, Xu H, Zhong Q, Xu Y, Yao H, Lin K, Li X. GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47. Front Chem 2023; 11:1292869. [PMID: 37927570 PMCID: PMC10623438 DOI: 10.3389/fchem.2023.1292869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 10/09/2023] [Indexed: 11/07/2023] Open
Abstract
Identifying compound-protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning from large sample data. And the CPI for specific target often has a small amount of data available. To overcome the dilemma, we propose a virtual screening model, in which word2vec is used as an embedding tool to generate low-dimensional vectors of SMILES of compounds and amino acid sequences of proteins, and the modified multi-grained cascade forest based gcForest is used as the classifier. This proposed method is capable of constructing a model from raw data, adjusting model complexity according to the scale of datasets, especially for small scale datasets, and is robust with few hyper-parameters and without over-fitting. We found that the proposed model is superior to other CPI prediction models and performs well on the constructed challenging dataset. We finally predicted 2 new inhibitors for clusters of differentiation 47(CD47) which has few known inhibitors. The IC50s of enzyme activities of these 2 new small molecular inhibitors targeting CD47-SIRPα interaction are 3.57 and 4.79 μM respectively. These results fully demonstrate the competence of this concise but efficient tool for CPI prediction.
Collapse
Affiliation(s)
- Wenying Shan
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
- Faculty of Health Sciences, University of Macau, Macau, China
| | - Lvqi Chen
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Hao Xu
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry, Nanjing, China
- National Engineering Laboratory for Biomass Chemical Utilization, Nanjing, China
| | - Qinghao Zhong
- School of Humanities and Social Sciences, The Chinese University of Hong Kong, Shenzhen, China
| | - Yinqiu Xu
- Department of Pharmacy, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, China
| | - Hequan Yao
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Kejiang Lin
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Xuanyi Li
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
18
|
Wei MM, Yu CQ, Li LP, You ZH, Wang L. BCMCMI: A Fusion Model for Predicting circRNA-miRNA Interactions Combining Semantic and Meta-path. J Chem Inf Model 2023; 63:5384-5394. [PMID: 37535872 DOI: 10.1021/acs.jcim.3c00852] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
More and more evidence suggests that circRNA plays a vital role in generating and treating diseases by interacting with miRNA. Therefore, accurate prediction of potential circRNA-miRNA interaction (CMI) has become urgent. However, traditional wet experiments are time-consuming and costly, and the results will be affected by objective factors. In this paper, we propose a computational model BCMCMI, which combines three features to predict CMI. Specifically, BCMCMI utilizes the bidirectional encoding capability of the BERT algorithm to extract sequence features from the semantic information of circRNA and miRNA. Then, a heterogeneous network is constructed based on cosine similarity and known CMI information. The Metapath2vec is employed to conduct random walks following meta-paths in the network to capture topological features, including similarity features. Finally, potential CMIs are predicted using the XGBoost classifier. BCMCMI achieves superior results compared to other state-of-the-art models on two benchmark datasets for CMI prediction. We also utilize t-SNE to visually observe the distribution of the extracted features on a randomly selected dataset. The remarkable prediction results show that BCMCMI can serve as a valuable complement to the wet experiment process.
Collapse
Affiliation(s)
- Meng-Meng Wei
- School of Information Engineering, Xijing University, Xi'an, Shaanxi 710123, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, Shaanxi 710123, China
| | - Li-Ping Li
- College of Agriculture and Forestry, Longdong University, Qingyang, Gansu 745000, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
| | - Lei Wang
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, Guangxi 530007, China
| |
Collapse
|
19
|
Watanabe N, Kuriya Y, Murata M, Yamamoto M, Shimizu M, Araki M. Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD. BIOLOGY 2023; 12:795. [PMID: 37372080 DOI: 10.3390/biology12060795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 05/17/2023] [Accepted: 05/29/2023] [Indexed: 06/29/2023]
Abstract
The number of unannotated protein sequences is explosively increasing due to genome sequence technology. A more comprehensive understanding of protein functions for protein annotation requires the discovery of new features that cannot be captured from conventional methods. Deep learning can extract important features from input data and predict protein functions based on the features. Here, protein feature vectors generated by 3 deep learning models are analyzed using Integrated Gradients to explore important features of amino acid sites. As a case study, prediction and feature extraction models for UbiD enzymes were built using these models. The important amino acid residues extracted from the models were different from secondary structures, conserved regions and active sites of known UbiD information. Interestingly, the different amino acid residues within UbiD sequences were regarded as important factors depending on the type of models and sequences. The Transformer models focused on more specific regions than the other models. These results suggest that each deep learning model understands protein features with different aspects from existing knowledge and has the potential to discover new laws of protein functions. This study will help to extract new protein features for the other protein annotations.
Collapse
Affiliation(s)
- Naoki Watanabe
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
| | - Yuki Kuriya
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
| | - Masahiro Murata
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai, Nada-Ku, Kobe 657-8501, Japan
| | - Masaki Yamamoto
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
| | - Masayuki Shimizu
- Bacchus Bio Innovation Co., Ltd., 6-3-7 Minatojima minami-machi, Kobe 650-0047, Japan
| | - Michihiro Araki
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai, Nada-Ku, Kobe 657-8501, Japan
- Graduate School of Medicine, Kyoto University, 54 Shogoin-Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan
- National Cerebral and Cardiovascular Center, 6-1 Kishibe-Shinmachi, Suita 564-8565, Japan
| |
Collapse
|
20
|
Meng X, Shang J, Ge D, Yang Y, Zhang T, Liu JX. ETGPDA: identification of piRNA-disease associations based on embedding transformation graph convolutional network. BMC Genomics 2023; 24:279. [PMID: 37226081 DOI: 10.1186/s12864-023-09380-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 05/15/2023] [Indexed: 05/26/2023] Open
Abstract
BACKGROUND Piwi-interacting RNAs (piRNAs) have been proven to be closely associated with human diseases. The identification of the potential associations between piRNA and disease is of great significance for complex diseases. Traditional "wet experiment" is time-consuming and high-priced, predicting the piRNA-disease associations by computational methods is of great significance. METHODS In this paper, a method based on the embedding transformation graph convolution network is proposed to predict the piRNA-disease associations, named ETGPDA. Specifically, a heterogeneous network is constructed based on the similarity information of piRNA and disease, as well as the known piRNA-disease associations, which is applied to extract low-dimensional embeddings of piRNA and disease based on graph convolutional network with an attention mechanism. Furthermore, the embedding transformation module is developed for the problem of embedding space inconsistency, which is lightweighter, stronger learning ability and higher accuracy. Finally, the piRNA-disease association score is calculated by the similarity of the piRNA and disease embedding. RESULTS Evaluated by fivefold cross-validation, the AUC of ETGPDA achieves 0.9603, which is better than the other five selected computational models. The case studies based on Head and neck squamous cell carcinoma and Alzheimer's disease further prove the superior performance of ETGPDA. CONCLUSIONS Hence, the ETGPDA is an effective method for predicting the hidden piRNA-disease associations.
Collapse
Affiliation(s)
- Xianghan Meng
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China.
| | - Daohui Ge
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China.
| | - Yi Yang
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Tongdui Zhang
- Science and Technology Innovation Service Institution of Rizhao, Rizhao, 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| |
Collapse
|