1
|
Karampuri A, Jakkula BK, Perugu S. ResisenseNet hybrid neural network model for predicting drug sensitivity and repurposing in breast Cancer. Sci Rep 2024; 14:23949. [PMID: 39397003 PMCID: PMC11471817 DOI: 10.1038/s41598-024-71076-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 08/23/2024] [Indexed: 10/15/2024] Open
Abstract
Breast cancer remains a leading cause of mortality among women worldwide, with drug resistance driven by transcription factors and mutations posing significant challenges. To address this, we present ResisenseNet, a predictive model for drug sensitivity and resistance. ResisenseNet integrates transcription factor expression, genomic markers, drugs, and molecular descriptors, employing a hybrid architecture of 1D-CNN + LSTM and DNN to effectively learn long-range and temporal patterns from amino acid sequences and transcription factor data. The model demonstrated exceptional predictive accuracy, achieving a validation accuracy of 0.9794 and a loss value of 0.042. Comprehensive validation included comparisons with state-of-the-art models and ablation studies, confirming the robustness of the developed architecture. ResisenseNet has been applied to repurpose existing anticancer drugs across 14 different cancers, with a focus on breast cancer. Among the malignancies studied, drugs targeting Low-grade Glioma (LGG) and Lung Adenocarcinoma (LUAD) showed increased sensitivity to breast cancer as per ResisenseNet's assessment. Further evaluation of the predicted sensitive drugs revealed that 14 had no prior history of anticancer activity against breast cancer. These drugs target key signaling pathways involved in breast cancer, presenting novel therapeutic opportunities. ResisenseNet addresses drug resistance by filtering ineffective compounds and enhancing chemotherapy for breast cancer. In vitro studies on sensitive drugs provide valuable insights into breast cancer prognosis, contributing to improved treatment strategies.
Collapse
Affiliation(s)
- Anush Karampuri
- Department of Biotechnology, National Institute of Technology, Warangal, 500604, India
| | - Bharath Kumar Jakkula
- Department of Biotechnology, National Institute of Technology, Warangal, 500604, India
| | - Shyam Perugu
- Department of Biotechnology, National Institute of Technology, Warangal, 500604, India.
| |
Collapse
|
2
|
Huang J, Sun C, Li M, Tang R, Xie B, Wang S, Wei JM. Structure-inclusive similarity based directed GNN: a method that can control information flow to predict drug-target binding affinity. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae563. [PMID: 39292540 DOI: 10.1093/bioinformatics/btae563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/21/2024] [Accepted: 09/17/2024] [Indexed: 09/20/2024]
Abstract
MOTIVATION Exploring the association between drugs and targets is essential for drug discovery and repurposing. Comparing with the traditional methods that regard the exploration as a binary classification task, predicting the drug-target binding affinity can provide more specific information. Many studies work based on the assumption that similar drugs may interact with the same target. These methods constructed a symmetric graph according to the undirected drug similarity or target similarity. Although these similarities can measure the difference between two molecules, it is unable to analyze the inclusion relationship of their substructure. For example, if drug A contains all the substructures of drug B, then in the message-passing mechanism of the graph neural network, drug A should acquire all the properties of drug B, while drug B should only obtain some of the properties of A. RESULTS To this end, we proposed a structure-inclusive similarity (SIS) which measures the similarity of two drugs by considering the inclusion relationship of their substructures. Based on SIS, we constructed a drug graph and a target graph, respectively, and predicted the binding affinities between drugs and targets by a graph convolutional network-based model. Experimental results show that considering the inclusion relationship of the substructure of two molecules can effectively improve the accuracy of the prediction model. The performance of our SIS-based prediction method outperforms several state-of-the-art methods for drug-target binding affinity prediction. The case studies demonstrate that our model is a practical tool to predict the binding affinity between drugs and targets. AVAILABILITY AND IMPLEMENTATION Source codes and data are available at https://github.com/HuangStomach/SISDTA.
Collapse
Affiliation(s)
- Jipeng Huang
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Chang Sun
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Minglei Li
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Rong Tang
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Bin Xie
- College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China
| | - Shuqin Wang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin, Xi Qing District 300387, China
| | - Jin-Mao Wei
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
| |
Collapse
|
3
|
Todhunter ME, Jubair S, Verma R, Saqe R, Shen K, Duffy B. Artificial intelligence and machine learning applications for cultured meat. Front Artif Intell 2024; 7:1424012. [PMID: 39381621 PMCID: PMC11460582 DOI: 10.3389/frai.2024.1424012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 08/21/2024] [Indexed: 10/10/2024] Open
Abstract
Cultured meat has the potential to provide a complementary meat industry with reduced environmental, ethical, and health impacts. However, major technological challenges remain which require time-and resource-intensive research and development efforts. Machine learning has the potential to accelerate cultured meat technology by streamlining experiments, predicting optimal results, and reducing experimentation time and resources. However, the use of machine learning in cultured meat is in its infancy. This review covers the work available to date on the use of machine learning in cultured meat and explores future possibilities. We address four major areas of cultured meat research and development: establishing cell lines, cell culture media design, microscopy and image analysis, and bioprocessing and food processing optimization. In addition, we have included a survey of datasets relevant to CM research. This review aims to provide the foundation necessary for both cultured meat and machine learning scientists to identify research opportunities at the intersection between cultured meat and machine learning.
Collapse
Affiliation(s)
| | - Sheikh Jubair
- Alberta Machine Intelligence Institute, Edmonton, AB, Canada
| | - Ruchika Verma
- Alberta Machine Intelligence Institute, Edmonton, AB, Canada
| | - Rikard Saqe
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
| | - Kevin Shen
- Department of Mathematics, University of Waterloo, Waterloo, ON, Canada
| | | |
Collapse
|
4
|
Shi W, Zhang Y, Sun Y, Lin Z. Function-Genes and Disease-Genes Prediction Based on Network Embedding and One-Class Classification. Interdiscip Sci 2024:10.1007/s12539-024-00638-7. [PMID: 39230798 DOI: 10.1007/s12539-024-00638-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 05/14/2024] [Accepted: 05/21/2024] [Indexed: 09/05/2024]
Abstract
Using genes which have been experimentally-validated for diseases (functions) can develop machine learning methods to predict new disease/function-genes. However, the prediction of both function-genes and disease-genes faces the same problem: there are only certain positive examples, but no negative examples. To solve this problem, we proposed a function/disease-genes prediction algorithm based on network embedding (Variational Graph Auto-Encoders, VGAE) and one-class classification (Fast Minimum Covariance Determinant, Fast-MCD): VGAEMCD. Firstly, we constructed a protein-protein interaction (PPI) network centered on experimentally-validated genes; then VGAE was used to get the embeddings of nodes (genes) in the network; finally, the embeddings were input into the improved deep learning one-class classifier based on Fast-MCD to predict function/disease-genes. VGAEMCD can predict function-gene and disease-gene in a unified way, and only the experimentally-verified genes are needed to provide (no need for expression profile). VGAEMCD outperforms classical one-class classification algorithms in Recall, Precision, F-measure, Specificity, and Accuracy. Further experiments show that seven metrics of VGAEMCD are higher than those of state-of-art function/disease-genes prediction algorithms. The above results indicate that VGAEMCD can well learn the distribution characteristics of positive examples and accurately identify function/disease-genes.
Collapse
Affiliation(s)
- Weiyu Shi
- College of Maritime Economics and Management, Dalian Maritime University, Dalian, 116026, China
| | - Yan Zhang
- Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, China
| | - Yeqing Sun
- Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, China.
| | - Zhengkui Lin
- College of Maritime Economics and Management, Dalian Maritime University, Dalian, 116026, China.
| |
Collapse
|
5
|
Liu M, Yang Y, Liu Q, Liu L, Wang G. A Knowledge-Driven Self-Supervised Approach for Molecular Generation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1579-1590. [PMID: 38805329 DOI: 10.1109/tcbb.2024.3406600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]
Abstract
Due to the great successes of Graph Neural Networks (GNN) in numerous fields, growing research interests have been devoted to applying GNN to molecular learning tasks. The molecule structure can be naturally represented as graphs where atoms and bonds refer to nodes and edges respectively. However, the atoms are not haphazardly stacked together but combined into various spatial geometries. Meanwhile, since chemical reactions mainly occur in substructures such as functional groups, the substructure plays a decisive role in the molecule's properties. Therefore, directly applying GNN to molecular representation learning could ignore the molecular spatial structure and the substructure properties which in turn degrades the performance of downstream tasks. In this paper, we propose Knowledge-Driven Self-Supervised Model for Molecular Representation Learning (KSMRL) to address above problems. The KSMRL consists of two major pathways: (1) the Spatial Information (SI) based pathway which preserves the spatial information of molecular structure, (2) the Subgraph Constraint (SC) based pathway which retains the properties of substructures into the molecular representation. In this manner, both the atomic level and substructure level information can be included in modeling. According to the experimental results on multiple datasets, the proposed KSMRL can generate discriminative molecular representations. In molecular generation tasks, KSMRL combined with Autoregressive Flow (AF) models or Discrete Flow (DF) models outperforms the state-of-the-art baselines over all datasets. In addition, we demonstrate the effectiveness of KSMRL with property optimization experiments. To indicate the ability of predicting specified potential Drug-Target Interactions (DTIs), a case study for discriminating the interactions between molecule generated by KSMRL and targets is also given.
Collapse
|
6
|
Liu Z, Chen Q, Lan W, Lu H, Zhang S. SSLDTI: A novel method for drug-target interaction prediction based on self-supervised learning. Artif Intell Med 2024; 149:102778. [PMID: 38462280 DOI: 10.1016/j.artmed.2024.102778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 12/01/2023] [Accepted: 01/14/2024] [Indexed: 03/12/2024]
Abstract
Many computational methods have been proposed to identify potential drug-target interactions (DTIs) to expedite drug development. Graph neural network (GNN) methods are considered to be one of the most effective approaches. However, shallow GNN methods can only aggregate local information from nodes. Also, deep GNN methods may result in over-smoothing while obtaining long-distance neighbourhood information. As a result, existing GNN methods struggle to extract the complete features of the graph. Additionally, the number of known DTIs is insufficient, and there are far more unknown drug-target pairs than known DTIs, leading to class imbalance. This article proposes a model that combines graph autoencoder and self-supervised learning to accurately encode multilevel features of graphs using only a small number of labelled samples. We introduce a positive sample compensation coefficient to the objective function to mitigate the impact of class imbalance. Experiments on two datasets demonstrated that our model outperforms the four baseline methods, and the new DTIs predicted by the SSLDTI model were verified by the DrugBank database.
Collapse
Affiliation(s)
- Zhixian Liu
- School of Electronics and Information Engineering, Beibu Gulf University, Qinzhou, Guangxi, China
| | - Qingfeng Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, Guangxi, China.
| | - Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, Guangxi, China
| | - Huihui Lu
- School of Electronics and Information Engineering, Beibu Gulf University, Qinzhou, Guangxi, China
| | - Shichao Zhang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China.
| |
Collapse
|
7
|
Zhang Y, Chu Y, Lin S, Xiong Y, Wei DQ. ReHoGCNES-MDA: prediction of miRNA-disease associations using homogenous graph convolutional networks based on regular graph with random edge sampler. Brief Bioinform 2024; 25:bbae103. [PMID: 38517693 PMCID: PMC10959163 DOI: 10.1093/bib/bbae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/04/2024] [Accepted: 02/23/2024] [Indexed: 03/24/2024] Open
Abstract
Numerous investigations increasingly indicate the significance of microRNA (miRNA) in human diseases. Hence, unearthing associations between miRNA and diseases can contribute to precise diagnosis and efficacious remediation of medical conditions. The detection of miRNA-disease linkages via computational techniques utilizing biological information has emerged as a cost-effective and highly efficient approach. Here, we introduced a computational framework named ReHoGCNES, designed for prospective miRNA-disease association prediction (ReHoGCNES-MDA). This method constructs homogenous graph convolutional network with regular graph structure (ReHoGCN) encompassing disease similarity network, miRNA similarity network and known MDA network and then was tested on four experimental tasks. A random edge sampler strategy was utilized to expedite processes and diminish training complexity. Experimental results demonstrate that the proposed ReHoGCNES-MDA method outperforms both homogenous graph convolutional network and heterogeneous graph convolutional network with non-regular graph structure in all four tasks, which implicitly reveals steadily degree distribution of a graph does play an important role in enhancement of model performance. Besides, ReHoGCNES-MDA is superior to several machine learning algorithms and state-of-the-art methods on the MDA prediction. Furthermore, three case studies were conducted to further demonstrate the predictive ability of ReHoGCNES. Consequently, 93.3% (breast neoplasms), 90% (prostate neoplasms) and 93.3% (prostate neoplasms) of the top 30 forecasted miRNAs were validated by public databases. Hence, ReHoGCNES-MDA might serve as a dependable and beneficial model for predicting possible MDAs.
Collapse
Affiliation(s)
- Yufang Zhang
- School of Mathematical Sciences and SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai 200240, China
- Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan, 473006, China
| | - Yanyi Chu
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Dong-Qing Wei
- Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan, 473006, China
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
8
|
Wei J, Lu L, Shen T. Predicting drug-protein interactions by preserving the graph information of multi source data. BMC Bioinformatics 2024; 25:10. [PMID: 38177981 PMCID: PMC10768380 DOI: 10.1186/s12859-023-05620-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 12/15/2023] [Indexed: 01/06/2024] Open
Abstract
Examining potential drug-target interactions (DTIs) is a pivotal component of drug discovery and repurposing. Recently, there has been a significant rise in the use of computational techniques to predict DTIs. Nevertheless, previous investigations have predominantly concentrated on assessing either the connections between nodes or the consistency of the network's topological structure in isolation. Such one-sided approaches could severely hinder the accuracy of DTI predictions. In this study, we propose a novel method called TTGCN, which combines heterogeneous graph convolutional neural networks (GCN) and graph attention networks (GAT) to address the task of DTI prediction. TTGCN employs a two-tiered feature learning strategy, utilizing GAT and residual GCN (R-GCN) to extract drug and target embeddings from the diverse network, respectively. These drug and target embeddings are then fused through a mean-pooling layer. Finally, we employ an inductive matrix completion technique to forecast DTIs while preserving the network's node connectivity and topological structure. Our approach demonstrates superior performance in terms of area under the curve and area under the precision-recall curve in experimental comparisons, highlighting its significant advantages in predicting DTIs. Furthermore, case studies provide additional evidence of its ability to identify potential DTIs.
Collapse
Affiliation(s)
- Jiahao Wei
- School of Mathematical Sciences, Guizhou Normal University, Guiyang, 550025, China
| | - Linzhang Lu
- School of Mathematical Sciences, Guizhou Normal University, Guiyang, 550025, China.
- School of Mathematical Sciences, Xiamen University, Xiamen, 361005, China.
| | - Tie Shen
- Key Laboratory of Information and Computing Science Guizhou Province, Guizhou Normal University, Guizhou, 550001, China.
| |
Collapse
|
9
|
Tang R, Sun C, Huang J, Li M, Wei J, Liu J. Predicting Drug-Protein Interactions by Self-Adaptively Adjusting the Topological Structure of the Heterogeneous Network. IEEE J Biomed Health Inform 2023; 27:5675-5684. [PMID: 37672364 DOI: 10.1109/jbhi.2023.3312374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Many powerful computational methods based on graph neural networks (GNNs) have been proposed to predict drug-protein interactions (DPIs). It can effectively reduce laboratory workload and the cost of drug discovery and drug repurposing. However, many clinical functions of drugs and proteins are unknown due to their unobserved indications. Therefore, it is difficult to establish a reliable drug-protein heterogeneous network that can describe the relationships between drugs and proteins based on the available information. To solve this problem, we propose a DPI prediction method that can self-adaptively adjust the topological structure of the heterogeneous networks, and name it SATS. SATS establishes a representation learning module based on graph attention network to carry out the drug-protein heterogeneous network. It can self-adaptively learn the relationships among the nodes based on their attributes and adjust the topological structure of the network according to the training loss of the model. Finally, SATS predicts the interaction propensity between drugs and proteins based on their embeddings. The experimental results show that SATS can effectively improve the topological structure of the network. The performance of SATS outperforms several state-of-the-art DPI prediction methods under various evaluation metrics. These prove that SATS is useful to deal with incomplete data and unreliable networks. The case studies on the top section of the prediction results further demonstrate that SATS is powerful for discovering novel DPIs.
Collapse
|
10
|
Ye Q, Zhang X, Lin X. Drug-Target Interaction Prediction via Graph Auto-Encoder and Multi-Subspace Deep Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2647-2658. [PMID: 36107905 DOI: 10.1109/tcbb.2022.3206907] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Computational prediction of drug-target interaction (DTI) is important for the new drug discovery. Currently, the deep neural network (DNN) has been widely used in DTI prediction. However, parameters of the DNN could be insufficiently trained and features of the data could be insufficiently utilized, because the DTI data is limited and its dimension is very high. To deal with the above problems, in this paper, a graph auto-encoder and multi-subspace deep neural network (GAEMSDNN) is designed. GAEMSDNN enhances its learning ability with a graph auto-encoder, a subspace layer and an ensemble layer. The graph auto-encoder can preserve the reconstruction information. The subspace layer can obtain different strong feature subsets. The ensemble layer in the GAEMSDNN can comprehensively utilize these strong feature subsets in a unified optimization framework. As a result, more features can be extracted from the network input and the DNN network can be better trained. In experiments, the results of GAEMSDNN are significantly improved compared to the previous methods, which validates the effectiveness of our strategies.
Collapse
|
11
|
Zhang Y, Feng Y, Wu M, Deng Z, Wang S. VGAEDTI: drug-target interaction prediction based on variational inference and graph autoencoder. BMC Bioinformatics 2023; 24:278. [PMID: 37415176 DOI: 10.1186/s12859-023-05387-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/16/2023] [Indexed: 07/08/2023] Open
Abstract
MOTIVATION Accurate identification of Drug-Target Interactions (DTIs) plays a crucial role in many stages of drug development and drug repurposing. (i) Traditional methods do not consider the use of multi-source data and do not consider the complex relationship between data sources. (ii) How to better mine the hidden features of drug and target space from high-dimensional data, and better solve the accuracy and robustness of the model. RESULTS To solve the above problems, a novel prediction model named VGAEDTI is proposed in this paper. We constructed a heterogeneous network with multiple sources of information using multiple types of drug and target dataIn order to obtain deeper features of drugs and targets, we use two different autoencoders. One is variational graph autoencoder (VGAE) which is used to infer feature representations from drug and target spaces. The second is graph autoencoder (GAE) propagating labels between known DTIs. Experimental results on two public datasets show that the prediction accuracy of VGAEDTI is better than that of six DTIs prediction methods. These results indicate that model can predict new DTIs and provide an effective tool for accelerating drug development and repurposing.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- Yinfei Feng Qingdao University of Technology, Qingdao, China
| | - Yinfei Feng
- Yinfei Feng Qingdao University of Technology, Qingdao, China.
| | - Mengjie Wu
- Yinfei Feng Qingdao University of Technology, Qingdao, China
| | - Zengqian Deng
- Yinfei Feng Qingdao University of Technology, Qingdao, China
| | - Shudong Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao, China
| |
Collapse
|
12
|
Chen P, Zheng H. Drug-target interaction prediction based on spatial consistency constraint and graph convolutional autoencoder. BMC Bioinformatics 2023; 24:151. [PMID: 37069493 PMCID: PMC10109239 DOI: 10.1186/s12859-023-05275-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 04/05/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND Drug-target interaction (DTI) prediction plays an important role in drug discovery and repositioning. However, most of the computational methods used for identifying relevant DTIs do not consider the invariance of the nearest neighbour relationships between drugs or targets. In other words, they do not take into account the invariance of the topological relationships between nodes during representation learning. It may limit the performance of the DTI prediction methods. RESULTS Here, we propose a novel graph convolutional autoencoder-based model, named SDGAE, to predict DTIs. As the graph convolutional network cannot handle isolated nodes in a network, a pre-processing step was applied to reduce the number of isolated nodes in the heterogeneous network and facilitate effective exploitation of the graph convolutional network. By maintaining the graph structure during representation learning, the nearest neighbour relationships between nodes in the embedding space remained as close as possible to the original space. CONCLUSIONS Overall, we demonstrated that SDGAE can automatically learn more informative and robust feature vectors of drugs and targets, thus exhibiting significantly improved predictive accuracy for DTIs.
Collapse
Affiliation(s)
- Peng Chen
- School of Computer Science and Technology, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China
- Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China
| | - Haoran Zheng
- School of Computer Science and Technology, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China.
- Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China.
| |
Collapse
|
13
|
Zhao S, Meng J, Wekesa JS, Luan Y. Identification of small open reading frames in plant lncRNA using class-imbalance learning. Comput Biol Med 2023; 157:106773. [PMID: 36924731 DOI: 10.1016/j.compbiomed.2023.106773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 02/21/2023] [Accepted: 03/09/2023] [Indexed: 03/12/2023]
Abstract
Recently, small open reading frames (sORFs) in long noncoding RNA (lncRNA) have been demonstrated to encode small peptides that can help study the mechanisms of growth and development in organisms. Since machine learning-based computational methods are less costly compared with biological experiments, they can be used to identify sORFs and provide a basis for biological experiments. However, few computational methods and data resources have been exploited for identifying sORFs in plant lncRNA. Besides, machine learning models produce underperforming classifiers when faced with a class-imbalance problem. In this study, an alternative method called SMOTE based on weighted cosine distance (WCDSMOTE) which enables interaction with feature selection is put forward to synthesize minority class samples and weighted edited nearest neighbor (WENN) is applied to clean up majority class samples, thus, hybrid sampling WCDSMOTE-ENN is proposed to deal with imbalanced datasets with the multi-angle feature. A heterogeneous classifier ensemble is introduced to complete the classification task. Therefore, a novel computational method that is based on class-imbalance learning to identify the sORFs with coding potential in plant lncRNA (sORFplnc) is presented. Experimental results manifest that sORFplnc outperforms existing computational methods in identifying sORFs with coding potential. We anticipate that the proposed work can be a reference for relevant research and contribute to agriculture and biomedicine.
Collapse
Affiliation(s)
- Siyuan Zhao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, 116024, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, 116024, China.
| | - Jael Sanyanda Wekesa
- Department of Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi, 62000-00200, Kenya
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning, 116024, China
| |
Collapse
|
14
|
Tang C, Zhong C, Wang M, Zhou F. FMGNN: A Method to Predict Compound-Protein Interaction With Pharmacophore Features and Physicochemical Properties of Amino Acids. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1030-1040. [PMID: 35503835 DOI: 10.1109/tcbb.2022.3172340] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Identifying interactions between compounds and proteins is an essential task in drug discovery. To recommend compounds as new drug candidates, applying the computational approaches has a lower cost than conducting the wet-lab experiments. Machine learning-based methods, especially deep learning-based methods, have advantages in learning complex feature interactions between compounds and proteins. However, deep learning models will over-generalize and lead to the problem of predicting less relevant compound-protein pairs when the compound-protein feature interactions are high-dimensional sparse. This problem can be overcome by learning both low-order and high-order feature interactions. In this paper, we propose a novel hybrid model with Factorization Machines and Graph Neural Network called FMGNN to extract the low-order and high-order features, respectively. Then, we design a compound-protein interactions (CPIs) prediction method with pharmacophore features of compound and physicochemical properties of amino acids. The pharmacophore features can ensure that the prediction results much more fit the expectation of biological experiment and the physicochemical properties of amino acids are loaded into the embedding layer to improve the convergence speed and accuracy of protein feature learning. The experimental results on several datasets, especially on an imbalanced large-scale dataset, showed that our proposed method outperforms other existing methods for CPI prediction. The western blot experiment results on wogonin and its candidate target proteins also showed that our proposed method is effective and accurate for finding target proteins. The computer program of implementing the model FMGNN is available at https://github.com/tcygxu2021/FMGNN.
Collapse
|
15
|
Li Y, Sun C, Wei JM, Liu J. Drug-Protein interaction prediction by correcting the effect of incomplete information in heterogeneous information. Bioinformatics 2022; 38:5073-5080. [PMID: 36111859 DOI: 10.1093/bioinformatics/btac629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 08/30/2022] [Accepted: 09/15/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Large-scale heterogeneous data provide diverse perspectives for predicting drug-protein interactions (DPIs). However, the available information on molecular interactions and clinical associations related to drugs or proteins is incomplete because there may be unproven interactions and associations. This incomplete information in the available data is presented in the form of non-interaction and non-correlation, which may mislead the prediction model. Existing methods fuse incomplete and complete information without considering their integrity, so the negative effects of incomplete information still exist. RESULTS We develop a network-based DPI prediction method named BRWCP, which uses the complete information network to correct the prediction results acquired by the incomplete information network. By integrating relevant heterogeneous information that may be incomplete, the feature similarities of drugs and proteins are obtained. Combining the feature similarities and known DPIs, an incomplete information-based drug-protein heterogeneous network is constructed. Then, a bidirectional random walk with pruning algorithm is adopted in this heterogeneous network to predict potential DPIs. Next, the predicted DPIs are combined with the chemical fingerprint similarity of drugs and amino acid sequence similarity of proteins to construct the complete information network. The bidirectional random walk with pruning algorithm is applied in the new network to obtain the final prediction results until it converges. Experimental results show that BRWCP is superior to several state-of-the-art DPI prediction methods, and case studies further confirm its ability to tap potential DPIs. AVAILABILITY AND IMPLEMENTATION The code and data used in BRWCP are available at https://github.com/lyfdomain/BRWCP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yanfei Li
- College of Computer Science, Nankai University, Tianjin 300071, China.,Institute of Big Data, Nankai University, Tianjin 300071, China
| | - Chang Sun
- College of Computer Science, Nankai University, Tianjin 300071, China.,Institute of Big Data, Nankai University, Tianjin 300071, China
| | - Jin-Mao Wei
- College of Computer Science, Nankai University, Tianjin 300071, China.,Institute of Big Data, Nankai University, Tianjin 300071, China
| | - Jian Liu
- College of Computer Science, Nankai University, Tianjin 300071, China.,Institute of Big Data, Nankai University, Tianjin 300071, China
| |
Collapse
|
16
|
Lian M, Wang X, Du W. Integrated multi-similarity fusion and heterogeneous graph inference for drug-target interaction prediction. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
17
|
Xu X, Xuan P, Zhang T, Chen B, Sheng N. Inferring Drug-Target Interactions Based on Random Walk and Convolutional Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2294-2304. [PMID: 33729947 DOI: 10.1109/tcbb.2021.3066813] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Computational strategies for identifying new drug-target interactions (DTIs) can guide the process of drug discovery, reduce the cost and time of drug development, and thus promote drug development. Most recently proposed methods predict DTIs via integration of heterogeneous data related to drugs and proteins. However, previous methods have failed to deeply integrate these heterogeneous data and learn deep feature representations of multiple original similarities and interactions related to drugs and proteins. We therefore constructed a heterogeneous network by integrating a variety of connection relationships about drugs and proteins, including drugs, proteins, and drug side effects, as well as their similarities, interactions, and associations. A DTI prediction method based on random walk and convolutional neural network was proposed and referred to as DTIPred. DTIPred not only takes advantage of various original features related to drugs and proteins, but also integrates the topological information of heterogeneous networks. The prediction model is composed of two sides and learns the deep feature representation of a drug-protein pair. On the left side, random walk with restart is applied to learn the topological vectors of drug and protein nodes. The topological representation is further learned by the constructed deep learning frame based on convolutional neural network. The right side of the model focuses on integrating multiple original similarities and interactions of drugs and proteins to learn the original representation of the drug-protein pair. The results of cross-validation experiments demonstrate that DTIPred achieves better prediction performance than several state-of-the-art methods. During the validation process, DTIPred can retrieve more actual drug-protein interactions within the top part of the predicted results, which may be more helpful to biologists. In addition, case studies on five drugs further demonstrate the ability of DTIPred to discover potential drug-protein interactions.
Collapse
|
18
|
Xuan P, Zhang X, Zhang Y, Hu K, Nakaguchi T, Zhang T. multi-type neighbors enhanced global topology and pairwise attribute learning for drug-protein interaction prediction. Brief Bioinform 2022; 23:6581435. [PMID: 35514190 DOI: 10.1093/bib/bbac120] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 03/07/2022] [Accepted: 03/15/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate identification of proteins interacted with drugs helps reduce the time and cost of drug development. Most of previous methods focused on integrating multisource data about drugs and proteins for predicting drug-target interactions (DTIs). There are both similarity connection and interaction connection between two drugs, and these connections reflect their relationships from different perspectives. Similarly, two proteins have various connections from multiple perspectives. However, most of previous methods failed to deeply integrate these connections. In addition, multiple drug-protein heterogeneous networks can be constructed based on multiple kinds of connections. The diverse topological structures of these networks are still not exploited completely. RESULTS We propose a novel model to extract and integrate multi-type neighbor topology information, diverse similarities and interactions related to drugs and proteins. Firstly, multiple drug-protein heterogeneous networks are constructed according to multiple kinds of connections among drugs and those among proteins. The multi-type neighbor node sequences of a drug node (or a protein node) are formed by random walks on each network and they reflect the hidden neighbor topological structure of the node. Secondly, a module based on graph neural network (GNN) is proposed to learn the multi-type neighbor topologies of each node. We propose attention mechanisms at neighbor node level and at neighbor type level to learn more informative neighbor nodes and neighbor types. A network-level attention is also designed to enhance the context dependency among multiple neighbor topologies of a pair of drug and protein nodes. Finally, the attribute embedding of the drug-protein pair is formulated by a proposed embedding strategy, and the embedding covers the similarities and interactions about the pair. A module based on three-dimensional convolutional neural networks (CNN) is constructed to deeply integrate pairwise attributes. Extensive experiments have been performed and the results indicate GCDTI outperforms several state-of-the-art prediction methods. The recall rate estimation over the top-ranked candidates and case studies on 5 drugs further demonstrate GCDTI's ability in discovering potential drug-protein interactions.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China.,School of Computer Science, Shaanxi Normal University, Xi'an 710062, China
| | - Xiaowen Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Yu Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Kaimiao Hu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
19
|
Graph neural network approaches for drug-target interactions. Curr Opin Struct Biol 2022; 73:102327. [DOI: 10.1016/j.sbi.2021.102327] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 11/22/2021] [Accepted: 12/13/2021] [Indexed: 01/06/2023]
|
20
|
Hu K, Cui H, Zhang T, Sun C, Xuan P. ALDPI: adaptively learning importance of multi-scale topologies and multi-modality similarities for drug-protein interaction prediction. Brief Bioinform 2022; 23:6519792. [PMID: 35108362 DOI: 10.1093/bib/bbab606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 12/20/2021] [Accepted: 12/28/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Effective computational methods to predict drug-protein interactions (DPIs) are vital for drug discovery in reducing the time and cost of drug development. Recent DPI prediction methods mainly exploit graph data composed of multiple kinds of connections among drugs and proteins. Each node in the graph usually has topological structures with multiple scales formed by its first-order neighbors and multi-order neighbors. However, most of the previous methods do not consider the topological structures of multi-order neighbors. In addition, deep integration of the multi-modality similarities of drugs and proteins is also a challenging task. RESULTS We propose a model called ALDPI to adaptively learn the multi-scale topologies and multi-modality similarities with various significance levels. We first construct a drug-protein heterogeneous graph, which is composed of the interactions and the similarities with multiple modalities among drugs and proteins. An adaptive graph learning module is then designed to learn important kinds of connections in heterogeneous graph and generate new topology graphs. A module based on graph convolutional autoencoders is established to learn multiple representations, which imply the node attributes and multiple-scale topologies composed of one-order and multi-order neighbors, respectively. We also design an attention mechanism at neighbor topology level to distinguish the importance of these representations. Finally, since each similarity modality has its specific features, we construct a multi-layer convolutional neural network-based module to learn and fuse multi-modality features to obtain the attribute representation of each drug-protein node pair. Comprehensive experimental results show ALDPI's superior performance over six state-of-the-art methods. The results of recall rates of top-ranked candidates and case studies on five drugs further demonstrate the ability of ALDPI to discover potential drug-related protein candidates. CONTACT zhang@hlju.edu.cn.
Collapse
Affiliation(s)
- Kaimiao Hu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Chang Sun
- College of Computer Science, Nankai University, Tianjin 300071, China
| | - Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
21
|
Xuan P, Fan M, Cui H, Zhang T, Nakaguchi T. GVDTI: graph convolutional and variational autoencoders with attribute-level attention for drug-protein interaction prediction. Brief Bioinform 2021; 23:6412398. [PMID: 34718408 DOI: 10.1093/bib/bbab453] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 09/14/2021] [Accepted: 10/02/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying proteins that interact with drugs plays an important role in the initial period of developing drugs, which helps to reduce the development cost and time. Recent methods for predicting drug-protein interactions mainly focus on exploiting various data about drugs and proteins. These methods failed to completely learn and integrate the attribute information of a pair of drug and protein nodes and their attribute distribution. RESULTS We present a new prediction method, GVDTI, to encode multiple pairwise representations, including attention-enhanced topological representation, attribute representation and attribute distribution. First, a framework based on graph convolutional autoencoder is constructed to learn attention-enhanced topological embedding that integrates the topology structure of a drug-protein network for each drug and protein nodes. The topological embeddings of each drug and each protein are then combined and fused by multi-layer convolution neural networks to obtain the pairwise topological representation, which reveals the hidden topological relationships between drug and protein nodes. The proposed attribute-wise attention mechanism learns and adjusts the importance of individual attribute in each topological embedding of drug and protein nodes. Secondly, a tri-layer heterogeneous network composed of drug, protein and disease nodes is created to associate the similarities, interactions and associations across the heterogeneous nodes. The attribute distribution of the drug-protein node pair is encoded by a variational autoencoder. The pairwise attribute representation is learned via a multi-layer convolutional neural network to deeply integrate the attributes of drug and protein nodes. Finally, the three pairwise representations are fused by convolutional and fully connected neural networks for drug-protein interaction prediction. The experimental results show that GVDTI outperformed other seven state-of-the-art methods in comparison. The improved recall rates indicate that GVDTI retrieved more actual drug-protein interactions in the top ranked candidates than conventional methods. Case studies on five drugs further confirm GVDTI's ability in discovering the potential candidate drug-related proteins. CONTACT zhang@hlju.edu.cn Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Mengsi Fan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| |
Collapse
|
22
|
Xuan P, Hu K, Cui H, Zhang T, Nakaguchi T. Learning multi-scale heterogeneous representations and global topology for drug-target interaction prediction. IEEE J Biomed Health Inform 2021; 26:1891-1902. [PMID: 34673498 DOI: 10.1109/jbhi.2021.3121798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Identification of drug-target interactions (DTIs) plays a critical role in drug discovery and repositioning. Deep integration of inter-connections and intra-similarities between heterogeneous multi-source data related to drugs and targets, however, is a challenging issue. We propose a DTI prediction model by learning from drug and protein related multi-scale attributes and global topology formed by heterogeneous connections. A drug-protein-disease heterogeneous network (RPD-Net) is firstly constructed to associate diverse similarities, interactions and associations across nodes. Secondly, we propose a multi-scale pairwise deep representation learning module consisting of a new embedding strategy to integrate diverse inter-relations and intra-relations, and dilation convolutions for multi-scale deep representation extraction. A global topology learning module is proposed which is composed of strategy based on non-negative matrix factorization (NMF) to extract topology from RPD-Net, and a new relational-level attention mechanism for discriminative topology embedding. Experimental results using public dataset demonstrate improved performance over state-of-the-art methods and contributions of our major innovations. Evaluation results by top k recall rates and case studies on five drugs further show the effectiveness in retrieving potential target candidates for drugs.
Collapse
|
23
|
Logistic matrix factorisation and generative adversarial neural network-based method for predicting drug-target interactions. Mol Divers 2021; 25:1497-1516. [PMID: 34297278 DOI: 10.1007/s11030-021-10273-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 07/04/2021] [Indexed: 12/21/2022]
Abstract
Identifying drug-target protein association pairs is a prerequisite and a crucial task in drug discovery and development. Numerous computational models, based on different assumptions and algorithms, have been proposed as an alternative to the laborious, costly, and time-consuming traditional wet-lab methods. Most proposed methods focus on separated drug and target descriptors, calculated, respectively, from chemical structures and protein sequences, and fail to introduce and extract features where the interaction information is embedded. In this paper, we propose a new three-step method based on matrix factorisation and generative adversarial network (GAN) for drug-target interaction prediction. Firstly, the matrix factorisation technique is used to capture and extract the joint interaction feature, for both drugs and targets, from the drug-target interaction matrix. Then, a GAN is introduced for data augmentation. It generates a fake positive sample similar to the real positive sample (known interactions) in order to balance the samples, allow the exploitation of the entire negative sample, and increase the data size for an accurate prediction. Finally, a fully connected four-layer neural network is built for classification. Experimental results illustrate a higher prediction performance of the proposed method compared to shallow classifiers and to state-of-the-art methods with an accuracy higher than 97%. Moreover, the data generation effect is confirmed by evaluating the proposed method with and without the generation step. These results demonstrated the efficiency of the latent interaction features and data generation on predicting new drugs or repurposing existing drugs. Overview of the WGANMF-DTI workflow for the Drug-Target Interaction Prediction task.
Collapse
|
24
|
Sun C, Cao Y, Wei JM, Liu J. Autoencoder-based Drug-Target Interaction Prediction by Preserving the Consistency of Chemical Properties and Functions of Drugs. Bioinformatics 2021; 37:3618-3625. [PMID: 34019069 DOI: 10.1093/bioinformatics/btab384] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 05/06/2021] [Accepted: 05/18/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Exploring the potential drug-target interactions (DTIs) is a key step in drug discovery and repurposing. In recent years, predicting the probable DTIs through computational methods has gradually become a research hot spot. However, most of the previous studies failed to judiciously take into account the consistency between the chemical properties of drug and its functions. The changes of these relationships may lead to a severely negative effect on the prediction of DTIs. RESULTS We propose an autoencoder-based method, AEFS, under spatial consistency constraints to predict DTIs. A heterogeneous network is established to integrate the information of drugs, proteins and diseases. The original drug features are projected to an embedding (protein) space by a multi-layer encoder, and further projected into label (disease) space by a decoder. In this process, the clinical information of drugs is introduced to assist the DTI prediction. By maintaining the distribution of drug correlation in the original feature, embedding and label space, AEFS keeps the consistency between chemical properties and functions of drugs. Experimental comparisons indicate that AEFS is more robust for imbalanced data and of significantly superior performance in DTI prediction. Case studies further confirm its ability to mine the latent drug-target interactions. AVAILABILITY The code of AEFS is available at https://github.com/JackieSun818/AEFS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chang Sun
- College of Computer Science, Nankai University, Tianjin, 300071, China.,Institute of Big Data, Nankai University, Tianjin, 300071, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Jin-Mao Wei
- College of Computer Science, Nankai University, Tianjin, 300071, China.,Institute of Big Data, Nankai University, Tianjin, 300071, China
| | - Jian Liu
- College of Computer Science, Nankai University, Tianjin, 300071, China.,Institute of Big Data, Nankai University, Tianjin, 300071, China
| |
Collapse
|
25
|
Chu Y, Wang X, Dai Q, Wang Y, Wang Q, Peng S, Wei X, Qiu J, Salahub DR, Xiong Y, Wei DQ. MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph. Brief Bioinform 2021; 22:6261915. [PMID: 34009265 DOI: 10.1093/bib/bbab165] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/02/2021] [Accepted: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Accurate identification of the miRNA-disease associations (MDAs) helps to understand the etiology and mechanisms of various diseases. However, the experimental methods are costly and time-consuming. Thus, it is urgent to develop computational methods towards the prediction of MDAs. Based on the graph theory, the MDA prediction is regarded as a node classification task in the present study. To solve this task, we propose a novel method MDA-GCNFTG, which predicts MDAs based on Graph Convolutional Networks (GCNs) via graph sampling through the Feature and Topology Graph to improve the training efficiency and accuracy. This method models both the potential connections of feature space and the structural relationships of MDA data. The nodes of the graphs are represented by the disease semantic similarity, miRNA functional similarity and Gaussian interaction profile kernel similarity. Moreover, we considered six tasks simultaneously on the MDA prediction problem at the first time, which ensure that under both balanced and unbalanced sample distribution, MDA-GCNFTG can predict not only new MDAs but also new diseases without known related miRNAs and new miRNAs without known related diseases. The results of 5-fold cross-validation show that the MDA-GCNFTG method has achieved satisfactory performance on all six tasks and is significantly superior to the classic machine learning methods and the state-of-the-art MDA prediction methods. Moreover, the effectiveness of GCNs via the graph sampling strategy and the feature and topology graph in MDA-GCNFTG has also been demonstrated. More importantly, case studies for two diseases and three miRNAs are conducted and achieved satisfactory performance.
Collapse
Affiliation(s)
- Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Xuhong Wang
- School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, China
| | - Qiuying Dai
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Yanjing Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Qiankun Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, China
| | | | | | - Dennis Russell Salahub
- Department of Chemistry, University of Calgary, Fellow Royal Society of Canada and Fellow of the American Association for the Advancement of Science, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| |
Collapse
|
26
|
Xuan P, Zhang Y, Cui H, Zhang T, Guo M, Nakaguchi T. Integrating multi-scale neighbouring topologies and cross-modal similarities for drug-protein interaction prediction. Brief Bioinform 2021; 22:6220173. [PMID: 33839743 DOI: 10.1093/bib/bbab119] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 02/15/2021] [Accepted: 03/12/2021] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Identifying the proteins that interact with drugs can reduce the cost and time of drug development. Existing computerized methods focus on integrating drug-related and protein-related data from multiple sources to predict candidate drug-target interactions (DTIs). However, multi-scale neighboring node sequences and various kinds of drug and protein similarities are neither fully explored nor considered in decision making. RESULTS We propose a drug-target interaction prediction method, DTIP, to encode and integrate multi-scale neighbouring topologies, multiple kinds of similarities, associations, interactions related to drugs and proteins. We firstly construct a three-layer heterogeneous network to represent interactions and associations across drug, protein, and disease nodes. Then a learning framework based on fully-connected autoencoder is proposed to learn the nodes' low-dimensional feature representations within the heterogeneous network. Secondly, multi-scale neighbouring sequences of drug and protein nodes are formulated by random walks. A module based on bidirectional gated recurrent unit is designed to learn the neighbouring sequential information and integrate the low-dimensional features of nodes. Finally, we propose attention mechanisms at feature level, neighbouring topological level and similarity level to learn more informative features, topologies and similarities. The prediction results are obtained by integrating neighbouring topologies, similarities and feature attributes using a multiple layer CNN. Comprehensive experimental results over public dataset demonstrated the effectiveness of our innovative features and modules. Comparison with other state-of-the-art methods and case studies of five drugs further validated DTIP's ability in discovering the potential candidate drug-related proteins.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Yu Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| |
Collapse
|
27
|
Dou L, Yang F, Xu L, Zou Q. A comprehensive review of the imbalance classification of protein post-translational modifications. Brief Bioinform 2021; 22:6217722. [PMID: 33834199 DOI: 10.1093/bib/bbab089] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 02/17/2021] [Accepted: 02/24/2021] [Indexed: 12/13/2022] Open
Abstract
Post-translational modifications (PTMs) play significant roles in regulating protein structure, activity and function, and they are closely involved in various pathologies. Therefore, the identification of associated PTMs is the foundation of in-depth research on related biological mechanisms, disease treatments and drug design. Due to the high cost and time consumption of high-throughput sequencing techniques, developing machine learning-based predictors has been considered an effective approach to rapidly recognize potential modified sites. However, the imbalanced distribution of true and false PTM sites, namely, the data imbalance problem, largely effects the reliability and application of prediction tools. In this article, we conduct a systematic survey of the research progress in the imbalanced PTMs classification. First, we describe the modeling process in detail and outline useful data imbalance solutions. Then, we summarize the recently proposed bioinformatics tools based on imbalanced PTM data and simultaneously build a convenient website, ImClassi_PTMs (available at lab.malab.cn/∼dlj/ImbClassi_PTMs/), to facilitate the researchers to view. Moreover, we analyze the challenges of current computational predictors and propose some suggestions to improve the efficiency of imbalance learning. We hope that this work will provide comprehensive knowledge of imbalanced PTM recognition and contribute to advanced predictors in the future.
Collapse
Affiliation(s)
- Lijun Dou
- University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China
| | - Fenglong Yang
- University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|