1
|
Palhamkhani F, Alipour M, Dehnad A, Abbasi K, Razzaghi P, Ghasemi JB. DeepCompoundNet: enhancing compound-protein interaction prediction with multimodal convolutional neural networks. J Biomol Struct Dyn 2025; 43:1414-1423. [PMID: 38084744 DOI: 10.1080/07391102.2023.2291829] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 11/23/2023] [Indexed: 01/16/2025]
Abstract
Virtual screening has emerged as a valuable computational tool for predicting compound-protein interactions, offering a cost-effective and rapid approach to identifying potential candidate drug molecules. Current machine learning-based methods rely on molecular structures and their relationship in the network. The former utilizes information such as amino acid sequences and chemical structures, while the latter leverages interaction network data, such as protein-protein interactions, drug-disease interactions, and protein-disease interactions. However, there has been limited exploration of integrating molecular information with interaction networks. This study presents DeepCompoundNet, a deep learning-based model that integrates protein features, drug properties, and diverse interaction data to predict chemical-protein interactions. DeepCompoundNet outperforms state-of-the-art methods for compound-protein interaction prediction, as demonstrated through performance evaluations. Our findings highlight the complementary nature of multiple interaction data, extending beyond amino acid sequence homology and chemical structure similarity. Moreover, our model's analysis confirms that DeepCompoundNet gets higher performance in predicting interactions between proteins and chemicals not observed in the training samples.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Farnaz Palhamkhani
- Chemistry Department, Faculty of Chemistry, School of Sciences, University of Tehran, Tehran, Iran
| | - Milad Alipour
- Department of Interdisciplinary Technologies, Network Science and Technology, College of Interdisciplinary Sciences and Technologies, University of Tehran, Tehran, Iran
| | - Abbas Dehnad
- Faculty of Mathematics and Computer Science, Allameh Tabatabai University, Tehran, Iran
| | - Karim Abbasi
- Laboratory of System Biology, Bioinformatics & Artificial Intelligence in Medicine (LBB&AI), Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| | - Jahan B Ghasemi
- Chemistry Department, Faculty of Chemistry, School of Sciences, University of Tehran, Tehran, Iran
| |
Collapse
|
2
|
Deng M, Wang J, Zhao Y, Zhao Y, Cao H, Wang Z. Predicting drug and target interaction with dilated reparameterize convolution. Sci Rep 2025; 15:2579. [PMID: 39833385 PMCID: PMC11747116 DOI: 10.1038/s41598-025-86918-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 01/15/2025] [Indexed: 01/22/2025] Open
Abstract
Predicting drug-target interaction (DTI) stands as a pivotal and formidable challenge in pharmaceutical research. Many existing deep learning methods only learn the high-dimensional representation of ligands and targets on a small scale. However, it is difficult for the model to obtain the potential law of combining pockets or multiple binding sites on a large scale. To address this lacuna, we designed a large-kernel convolutional block for extracting large-scale sequence information and proposed a novel DTI prediction framework, named Rep-ConvDTI. The reparameterization method is introduced to help large-kernel convolutions capture small-scale information. We have also developed a gated attention mechanism to more efficiently characterize the interaction of drugs and targets. Extensive experiments demonstrate that Rep-ConvDTI achieves the most competitive performance against state-of-the-art baselines on the three benchmark datasets. Furthermore, we validated the potential of Rep-ConvDTI as a drug screening tool through model interpretative studies and drug screening experiments with cystathionine-β-synthase.
Collapse
Affiliation(s)
- Moping Deng
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jian Wang
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiming Zhao
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
| | - Yongjia Zhao
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
| | - Hao Cao
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang, 110016, Liaoning Province, China
| | - Zhuo Wang
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
3
|
Lu X, Xie L, Xu L, Mao R, Xu X, Chang S. Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph. Comput Struct Biotechnol J 2024; 23:1666-1679. [PMID: 38680871 PMCID: PMC11046066 DOI: 10.1016/j.csbj.2024.04.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/01/2024] [Accepted: 04/10/2024] [Indexed: 05/01/2024] Open
Abstract
Accurately predicting molecular properties is a challenging but essential task in drug discovery. Recently, many mono-modal deep learning methods have been successfully applied to molecular property prediction. However, mono-modal learning is inherently limited as it relies solely on a single modality of molecular representation, which restricts a comprehensive understanding of drug molecules. To overcome the limitations, we propose a multimodal fused deep learning (MMFDL) model to leverage information from different molecular representations. Specifically, we construct a triple-modal learning model by employing Transformer-Encoder, Bidirectional Gated Recurrent Unit (BiGRU), and graph convolutional network (GCN) to process three modalities of information from chemical language and molecular graph: SMILES-encoded vectors, ECFP fingerprints, and molecular graphs, respectively. We evaluate the proposed triple-modal model using five fusion approaches on six molecule datasets, including Delaney, Llinas2020, Lipophilicity, SAMPL, BACE, and pKa from DataWarrior. The results show that the MMFDL model achieves the highest Pearson coefficients, and stable distribution of Pearson coefficients in the random splitting test, outperforming mono-modal models in accuracy and reliability. Furthermore, we validate the generalization ability of our model in the prediction of binding constants for protein-ligand complex molecules, and assess the resilience capability against noise. Through analysis of feature distributions in chemical space and the assigned contribution of each modal model, we demonstrate that the MMFDL model shows the ability to acquire complementary information by using proper models and suitable fusion approaches. By leveraging diverse sources of bioinformatics information, multimodal deep learning models hold the potential for successful drug discovery.
Collapse
Affiliation(s)
- Xiaohua Lu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Rongzhi Mao
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| |
Collapse
|
4
|
Wang M, Wang J, Ji J, Ma C, Wang H, He J, Song Y, Zhang X, Cao Y, Dai Y, Hua M, Qin R, Li K, Cao L. Improving compound-protein interaction prediction by focusing on intra-modality and inter-modality dynamics with a multimodal tensor fusion strategy. Comput Struct Biotechnol J 2024; 23:3714-3729. [PMID: 39525082 PMCID: PMC11544084 DOI: 10.1016/j.csbj.2024.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 10/01/2024] [Accepted: 10/01/2024] [Indexed: 11/16/2024] Open
Abstract
Identifying novel compound-protein interactions (CPIs) plays a pivotal role in target identification and drug discovery. Although the recent multimodal methods have achieved outstanding advances in CPI prediction, they fail to effectively learn both intra-modality and inter-modality dynamics, which limits their prediction performance. To address the limitation, we propose a novel multimodal tensor fusion CPI prediction framework, named MMTF-CPI, which contains three unimodal learning modules for structure, heterogeneous network and transcriptional profiling modalities, a tensor fusion module and a prediction module. MMTF-CPI is capable of focusing on both intra-modality and inter-modality dynamics with the tensor fusion module. We demonstrated that MMTF-CPI is superior to multiple state-of-the-art multimodal methods across seven datasets. The prediction performance of MMTF-CPI is significantly improved with the tensor fusion module compared to other fusion methods. Moreover, our case studies confirmed the practical value of MMTF-CPI in target identification. Via MMTF-CPI, we also discovered several candidate compounds for the therapy of breast cancer and non-small cell lung cancer.
Collapse
Affiliation(s)
- Meng Wang
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Jianmin Wang
- Department of Integrative Biotechnology, Yonsei University, Incheon 21983, South Korea
| | - Jianxin Ji
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Chenjing Ma
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Hesong Wang
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Jia He
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Yongzhen Song
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Xuan Zhang
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Yong Cao
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Yanyan Dai
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Menglei Hua
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Ruihao Qin
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Kang Li
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| | - Lei Cao
- Department of Biostatistics, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
5
|
Lu Q, Zhou Z, Wang Q. Multi-layer graph attention neural networks for accurate drug-target interaction mapping. Sci Rep 2024; 14:26119. [PMID: 39478027 PMCID: PMC11525987 DOI: 10.1038/s41598-024-75742-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 10/08/2024] [Indexed: 11/02/2024] Open
Abstract
In the crucial process of drug discovery and repurposing, precise prediction of drug-target interactions (DTIs) is paramount. This study introduces a novel DTI prediction approach-Multi-Layer Graph Attention Neural Network (MLGANN), through a groundbreaking computational framework that effectively harnesses multi-source information to enhance prediction accuracy. MLGANN not only strides forward in constructing a multi-layer DTI network by capturing both direct interactions between drugs and targets as well as their multi-level information but also amalgamates Graph Convolutional Networks (GCN) with a self-attention mechanism to comprehensively integrate diverse data sources. This method exhibited significant performance surpassing existing approaches in comparative experiments, underscoring its immense potential in elevating the efficiency and accuracy of DTI predictions. More importantly, this study accentuates the significance of considering multi-source data information and network heterogeneity in the drug discovery process, offering new perspectives and tools for future pharmaceutical research.
Collapse
Affiliation(s)
- Qianwen Lu
- SDU-ANU Joint Science College, Shandong University, Weihai, 264209, Shandong, China
| | - Zhiheng Zhou
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100190, China
| | - Qi Wang
- College of Science, China Agricultural University, Beijing, 100083, China.
| |
Collapse
|
6
|
Liu S, Yu J, Ni N, Wang Z, Chen M, Li Y, Xu C, Ding Y, Zhang J, Yao X, Liu H. Versatile Framework for Drug-Target Interaction Prediction by Considering Domain-Specific Features. J Chem Inf Model 2024; 64:5646-5656. [PMID: 38976879 DOI: 10.1021/acs.jcim.4c00403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting drug-target interactions (DTIs) is one of the crucial tasks in drug discovery, but traditional wet-lab experiments are costly and time-consuming. Recently, deep learning has emerged as a promising tool for accelerating DTI prediction due to its powerful performance. However, the models trained on limited known DTI data struggle to generalize effectively to novel drug-target pairs. In this work, we propose a strategy to train an ensemble of models by capturing both domain-generic and domain-specific features (E-DIS) to learn diverse domain features and adapt them to out-of-distribution data. Multiple experts were trained on different domains to capture and align domain-specific information from various distributions without accessing any data from unseen domains. E-DIS provides a comprehensive representation of proteins and ligands by capturing diverse features. Experimental results on four benchmark data sets in both in-domain and cross-domain settings demonstrated that E-DIS significantly improved model performance and domain generalization compared to existing methods. Our approach presents a significant advancement in DTI prediction by combining domain-generic and domain-specific features, enhancing the generalization ability of the DTI prediction model.
Collapse
Affiliation(s)
- Shuo Liu
- School of Pharmacy, Lanzhou University, Gansu 730000, China
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jialiang Yu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Ningxi Ni
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Zidong Wang
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Mengyun Chen
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Gansu 730000, China
| | - Chen Xu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yahao Ding
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jun Zhang
- Changping Laboratory, Beijing 102200, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| |
Collapse
|
7
|
Zhou G, Qin Y, Hong Q, Li H, Chen H, Shen J. GEMF: a novel geometry-enhanced mid-fusion network for PLA prediction. Brief Bioinform 2024; 25:bbae333. [PMID: 38980371 PMCID: PMC11232467 DOI: 10.1093/bib/bbae333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 06/04/2024] [Accepted: 06/26/2024] [Indexed: 07/10/2024] Open
Abstract
Accurate prediction of protein-ligand binding affinity (PLA) is important for drug discovery. Recent advances in applying graph neural networks have shown great potential for PLA prediction. However, existing methods usually neglect the geometric information (i.e. bond angles), leading to difficulties in accurately distinguishing different molecular structures. In addition, these methods also pose limitations in representing the binding process of protein-ligand complexes. To address these issues, we propose a novel geometry-enhanced mid-fusion network, named GEMF, to learn comprehensive molecular geometry and interaction patterns. Specifically, the GEMF consists of a graph embedding layer, a message passing phase, and a multi-scale fusion module. GEMF can effectively represent protein-ligand complexes as graphs, with graph embeddings based on physicochemical and geometric properties. Moreover, our dual-stream message passing framework models both covalent and non-covalent interactions. In particular, the edge-update mechanism, which is based on line graphs, can fuse both distance and angle information in the covalent branch. In addition, the communication branch consisting of multiple heterogeneous interaction modules is developed to learn intricate interaction patterns. Finally, we fuse the multi-scale features from the covalent, non-covalent, and heterogeneous interaction branches. The extensive experimental results on several benchmarks demonstrate the superiority of GEMF compared with other state-of-the-art methods.
Collapse
Affiliation(s)
- Guoqiang Zhou
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Yuke Qin
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Qiansen Hong
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Haoran Li
- School of Computing and Information Technology, University of Wollongong, Northfields Avenue, NSW 2522, Australia
| | - Huaming Chen
- School of Electrical and Computer Engineering, University of Sydney, Camperdown, NSW 2050, Australia
| | - Jun Shen
- School of Computing and Information Technology, University of Wollongong, Northfields Avenue, NSW 2522, Australia
| |
Collapse
|
8
|
Li Y, Liu B, Deng J, Guo Y, Du H. Image-based molecular representation learning for drug development: a survey. Brief Bioinform 2024; 25:bbae294. [PMID: 38920347 PMCID: PMC11200195 DOI: 10.1093/bib/bbae294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/19/2024] [Accepted: 06/08/2024] [Indexed: 06/27/2024] Open
Abstract
Artificial intelligence (AI) powered drug development has received remarkable attention in recent years. It addresses the limitations of traditional experimental methods that are costly and time-consuming. While there have been many surveys attempting to summarize related research, they only focus on general AI or specific aspects such as natural language processing and graph neural network. Considering the rapid advance on computer vision, using the molecular image to enable AI appears to be a more intuitive and effective approach since each chemical substance has a unique visual representation. In this paper, we provide the first survey on image-based molecular representation for drug development. The survey proposes a taxonomy based on the learning paradigms in computer vision and reviews a large number of corresponding papers, highlighting the contributions of molecular visual representation in drug development. Besides, we discuss the applications, limitations and future directions in the field. We hope this survey could offer valuable insight into the use of image-based molecular representation learning in the context of drug development.
Collapse
Affiliation(s)
- Yue Li
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Bingyan Liu
- School of Computer Science, Beijing University of Posts and Telecommunications, No.10 Xituchen Street, 100876, Beijing, China
| | - Jinyan Deng
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Yi Guo
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Hongbo Du
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
- Institute of Liver Disease, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| |
Collapse
|
9
|
Zeng X, Meng FF, Wen ML, Li SJ, Li Y. GNNGL-PPI: multi-category prediction of protein-protein interactions using graph neural networks based on global graphs and local subgraphs. BMC Genomics 2024; 25:406. [PMID: 38724906 PMCID: PMC11080243 DOI: 10.1186/s12864-024-10299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 04/10/2024] [Indexed: 05/13/2024] Open
Abstract
Most proteins exert their functions by interacting with other proteins, making the identification of protein-protein interactions (PPI) crucial for understanding biological activities, pathological mechanisms, and clinical therapies. Developing effective and reliable computational methods for predicting PPI can significantly reduce the time-consuming and labor-intensive associated traditional biological experiments. However, accurately identifying the specific categories of protein-protein interactions and improving the prediction accuracy of the computational methods remain dual challenges. To tackle these challenges, we proposed a novel graph neural network method called GNNGL-PPI for multi-category prediction of PPI based on global graphs and local subgraphs. GNNGL-PPI consisted of two main components: using Graph Isomorphism Network (GIN) to extract global graph features from PPI network graph, and employing GIN As Kernel (GIN-AK) to extract local subgraph features from the subgraphs of protein vertices. Additionally, considering the imbalanced distribution of samples in each category within the benchmark datasets, we introduced an Asymmetric Loss (ASL) function to further enhance the predictive performance of the method. Through evaluations on six benchmark test sets formed by three different dataset partitioning algorithms (Random, BFS, DFS), GNNGL-PPI outperformed the state-of-the-art multi-category prediction methods of PPI, as measured by the comprehensive performance evaluation metric F1-measure. Furthermore, interpretability analysis confirmed the effectiveness of GNNGL-PPI as a reliable multi-category prediction method for predicting protein-protein interactions.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, 671003, Dali, China
| | - Fan-Fang Meng
- College of Mathematics and Computer Science, Dali University, 671003, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, 650000, Kunming, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control & Prevention, 671000, Dali, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, 671003, Dali, China.
| |
Collapse
|
10
|
Rafiei F, Zeraati H, Abbasi K, Razzaghi P, Ghasemi JB, Parsaeian M, Masoudi-Nejad A. CFSSynergy: Combining Feature-Based and Similarity-Based Methods for Drug Synergy Prediction. J Chem Inf Model 2024; 64:2577-2585. [PMID: 38514966 DOI: 10.1021/acs.jcim.3c01486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Drug synergy prediction plays a vital role in cancer treatment. Because experimental approaches are labor-intensive and expensive, computational-based approaches get more attention. There are two types of computational methods for drug synergy prediction: feature-based and similarity-based. In feature-based methods, the main focus is to extract more discriminative features from drug pairs and cell lines to pass to the task predictor. In similarity-based methods, the similarities among all drugs and cell lines are utilized as features and fed into the task predictor. In this work, a novel approach, called CFSSynergy, that combines these two viewpoints is proposed. First, a discriminative representation is extracted for paired drugs and cell lines as input. We have utilized transformer-based architecture for drugs. For cell lines, we have created a similarity matrix between proteins using the Node2Vec algorithm. Then, the new cell line representation is computed by multiplying the protein-protein similarity matrix and the initial cell line representation. Next, we compute the similarity between unique drugs and unique cells using the learned representation for paired drugs and cell lines. Then, we compute a new representation for paired drugs and cell lines based on the similarity-based features and the learned features. Finally, these features are fed to XGBoost as a task predictor. Two well-known data sets were used to evaluate the performance of our proposed method: DrugCombDB and OncologyScreen. The CFSSynergy approach consistently outperformed existing methods in comparative evaluations. This substantiates the efficacy of our approach in capturing complex synergistic interactions between drugs and cell lines, setting it apart from conventional similarity-based or feature-based methods.
Collapse
Affiliation(s)
- Fatemeh Rafiei
- Department of Epidemiology and Biostatistics, School of Health, Tehran University of Medical Sciences, Tehran 14167-53955, Iran
| | - Hojjat Zeraati
- Department of Epidemiology and Biostatistics, School of Health, Tehran University of Medical Sciences, Tehran 14167-53955, Iran
| | - Karim Abbasi
- Laboratory of System Biology, Bioinformatics & Artificial Intelligence in Medicine (LBB&AI), Faculty of Mathematics and Computer Science, Kharazmi University, Tehran 14588-89694, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan 45137-66731, Iran
| | - Jahan B Ghasemi
- Chemistry Department, Faculty of Chemistry, School of Sciences, University of Tehran, Tehran 14174-66191, Iran
| | - Mahboubeh Parsaeian
- Department of Epidemiology and Biostatistics, School of Health, Tehran University of Medical Sciences, Tehran 14167-53955, Iran
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, U.K
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 13145-1365, Iran
| |
Collapse
|
11
|
Zeng X, Li SJ, Lv SQ, Wen ML, Li Y. A comprehensive review of the recent advances on predicting drug-target affinity based on deep learning. Front Pharmacol 2024; 15:1375522. [PMID: 38628639 PMCID: PMC11019008 DOI: 10.3389/fphar.2024.1375522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 03/21/2024] [Indexed: 04/19/2024] Open
Abstract
Accurate calculation of drug-target affinity (DTA) is crucial for various applications in the pharmaceutical industry, including drug screening, design, and repurposing. However, traditional machine learning methods for calculating DTA often lack accuracy, posing a significant challenge in accurately predicting DTA. Fortunately, deep learning has emerged as a promising approach in computational biology, leading to the development of various deep learning-based methods for DTA prediction. To support researchers in developing novel and highly precision methods, we have provided a comprehensive review of recent advances in predicting DTA using deep learning. We firstly conducted a statistical analysis of commonly used public datasets, providing essential information and introducing the used fields of these datasets. We further explored the common representations of sequences and structures of drugs and targets. These analyses served as the foundation for constructing DTA prediction methods based on deep learning. Next, we focused on explaining how deep learning models, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformer, and Graph Neural Networks (GNNs), were effectively employed in specific DTA prediction methods. We highlighted the unique advantages and applications of these models in the context of DTA prediction. Finally, we conducted a performance analysis of multiple state-of-the-art methods for predicting DTA based on deep learning. The comprehensive review aimed to help researchers understand the shortcomings and advantages of existing methods, and further develop high-precision DTA prediction tool to promote the development of drug discovery.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control and Prevention, Dali, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering West Yunnan University of Applied Science, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
| |
Collapse
|
12
|
Li Q, Lv H, Chen Y, Shen J, Shi J, Zhou C, Yan F. Development and validation of a machine learning prediction model for perioperative red blood cell transfusions in cardiac surgery. Int J Med Inform 2024; 184:105343. [PMID: 38286086 DOI: 10.1016/j.ijmedinf.2024.105343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 01/10/2024] [Accepted: 01/17/2024] [Indexed: 01/31/2024]
Abstract
OBJECTIVE Several machine learning (ML) models have been used in perioperative red blood cell (RBC) transfusion risk for cardiac surgery with limited generalizability and no external validation. Hence, we sought to develop and comprehensively externally validate a ML model in a large dataset to estimate RBC transfusion in cardiac surgery with cardiopulmonary bypass (CPB). DESIGN A retrospective analysis of a multicenter clinical trial (NCT03782350). PATIENTS The study patients who underwent cardiac surgery with CPB came from four cardiac centers in China and Medical Information Mart for Intensive Cared (MIMIC-IV) dataset. MEASUREMENTS Data from Fuwai Hospital were used to develop an individualized prediction model for RBC transfusion. The model was externally validated in the data from three other centers and MIMIC-IV dataset. Twelve models were constructed. MAIN RESULTS A total of 11,201 eligible patients were included in the model development (2420 in Fuwai Hospital) and external validation (563 in the other three centers and 8218 in the MIMIC-IV dataset). A significant difference was observed between the Logistic Regression and CatboostClassifier (0.72 Vs. 0.74, P = 0.031) or RandomForestClassifier (0.72 Vs. 0.75 p = 0.012) in the external validation and MIMIV-IV datasets (age ≤ 70:0.63 Vs. 0.71, p < 0.001; age > 70:0.63 Vs. 0.70, 0.63 Vs. 0.71, p < 0.001). The CatboostClassifier and RandomForestClassifier model was comparable in development (0.83 Vs. 0.82, p = 0.419), external (0.74 Vs. 0.75, p = 0.268), and MIMIC-IV datasets (age ≤ 70: 0.71 Vs. 0.71, p = 0.574; age > 70: 0.70 Vs. 0.71, p = 0.981). Of note, they outperformed other ML models with excellent discrimination and calibration. The CatboostClassifier and RandomForestClassifier models achieved higher area under precision-recall curve and lower brier loss score in validation and MIMIC-IV datasets. Additionally, we confirmed that low preoperative hemoglobin, low body mass index, old age, and female sex increased the risk of RBC transfusion. CONCLUSIONS In our study, enrolling a broad range of cardiovascular surgeries with CPB and utilizing a restrictive RBC transfusion strategy, robustly validates the generalizability of ML algorithms for predicting RBC transfusion risk. Notably, the CatboostClassifier and RandomForestClassifier exhibit strong external clinical applicability, underscoring their potential for widespread adoption. This study provides compelling evidence supporting the efficacy and practical value of ML-based approaches in enhancing transfusion risk prediction in clinical practice.
Collapse
Affiliation(s)
- Qian Li
- Department of Anesthesiology, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/National Center for Cardiovascular Diseases, Beijing 100037, China
| | - Hong Lv
- Department of Anesthesiology, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/National Center for Cardiovascular Diseases, Beijing 100037, China
| | - Yuye Chen
- Department of Anesthesiology, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/National Center for Cardiovascular Diseases, Beijing 100037, China
| | - Jingjia Shen
- Department of Anesthesiology, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/National Center for Cardiovascular Diseases, Beijing 100037, China
| | - Jia Shi
- Department of Anesthesiology, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/National Center for Cardiovascular Diseases, Beijing 100037, China
| | - Chenghui Zhou
- Department of Anesthesiology, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/National Center for Cardiovascular Diseases, Beijing 100037, China; Center for Anesthesiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, 100029, China.
| | - Fuxia Yan
- Department of Anesthesiology, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/National Center for Cardiovascular Diseases, Beijing 100037, China.
| |
Collapse
|
13
|
Luo H, Zhu C, Wang J, Zhang G, Luo J, Yan C. Prediction of drug-disease associations based on reinforcement symmetric metric learning and graph convolution network. Front Pharmacol 2024; 15:1337764. [PMID: 38384286 PMCID: PMC10879308 DOI: 10.3389/fphar.2024.1337764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 01/18/2024] [Indexed: 02/23/2024] Open
Abstract
Accurately identifying novel indications for drugs is crucial in drug research and discovery. Traditional drug discovery is costly and time-consuming. Computational drug repositioning can provide an effective strategy for discovering potential drug-disease associations. However, the known experimentally verified drug-disease associations is relatively sparse, which may affect the prediction performance of the computational drug repositioning methods. Moreover, while the existing drug-disease prediction method based on metric learning algorithm has achieved better performance, it simply learns features of drugs and diseases only from the drug-centered perspective, and cannot comprehensively model the latent features of drugs and diseases. In this study, we propose a novel drug repositioning method named RSML-GCN, which applies graph convolutional network and reinforcement symmetric metric learning to predict potential drug-disease associations. RSML-GCN first constructs a drug-disease heterogeneous network by integrating the association and feature information of drugs and diseases. Then, the graph convolutional network (GCN) is applied to complement the drug-disease association information. Finally, reinforcement symmetric metric learning with adaptive margin is designed to learn the latent vector representation of drugs and diseases. Based on the learned latent vector representation, the novel drug-disease associations can be identified by the metric function. Comprehensive experiments on benchmark datasets demonstrated the superior prediction performance of RSML-GCN for drug repositioning.
Collapse
Affiliation(s)
- Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Chunli Zhu
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Junwei Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| |
Collapse
|
14
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Front Pharmacol 2024; 15:1331062. [PMID: 38384298 PMCID: PMC10879372 DOI: 10.3389/fphar.2024.1331062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/17/2024] [Indexed: 02/23/2024] Open
Abstract
There are two main ways to discover or design small drug molecules. The first involves fine-tuning existing molecules or commercially successful drugs through quantitative structure-activity relationships and virtual screening. The second approach involves generating new molecules through de novo drug design or inverse quantitative structure-activity relationship. Both methods aim to get a drug molecule with the best pharmacokinetic and pharmacodynamic profiles. However, bringing a new drug to market is an expensive and time-consuming endeavor, with the average cost being estimated at around $2.5 billion. One of the biggest challenges is screening the vast number of potential drug candidates to find one that is both safe and effective. The development of artificial intelligence in recent years has been phenomenal, ushering in a revolution in many fields. The field of pharmaceutical sciences has also significantly benefited from multiple applications of artificial intelligence, especially drug discovery projects. Artificial intelligence models are finding use in molecular property prediction, molecule generation, virtual screening, synthesis planning, repurposing, among others. Lately, generative artificial intelligence has gained popularity across domains for its ability to generate entirely new data, such as images, sentences, audios, videos, novel chemical molecules, etc. Generative artificial intelligence has also delivered promising results in drug discovery and development. This review article delves into the fundamentals and framework of various generative artificial intelligence models in the context of drug discovery via de novo drug design approach. Various basic and advanced models have been discussed, along with their recent applications. The review also explores recent examples and advances in the generative artificial intelligence approach, as well as the challenges and ongoing efforts to fully harness the potential of generative artificial intelligence in generating novel drug molecules in a faster and more affordable manner. Some clinical-level assets generated form generative artificial intelligence have also been discussed in this review to show the ever-increasing application of artificial intelligence in drug discovery through commercial partnerships.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Azim Ansari
- Computer Aided Drug Design Center Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Dhule, India
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, Malaysia
| | - Vinoth Kumarasamy
- Department of Parasitology and Medical Entomology, Faculty of Medicine, Universiti Kebangsaan Malaysia, Cheras, Malaysia
| | - Vetriselvan Subramaniyan
- Pharmacology Unit, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Selangor, Malaysia
- School of Bioengineering and Biosciences, Lovely Professional University, Phagwara, Punjab, India
| | - Ling Shing Wong
- Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia
| |
Collapse
|
15
|
Dehghan A, Abbasi K, Razzaghi P, Banadkuki H, Gharaghani S. CCL-DTI: contributing the contrastive loss in drug-target interaction prediction. BMC Bioinformatics 2024; 25:48. [PMID: 38291364 PMCID: PMC11264960 DOI: 10.1186/s12859-024-05671-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 01/22/2024] [Indexed: 02/01/2024] Open
Abstract
BACKGROUND The Drug-Target Interaction (DTI) prediction uses a drug molecule and a protein sequence as inputs to predict the binding affinity value. In recent years, deep learning-based models have gotten more attention. These methods have two modules: the feature extraction module and the task prediction module. In most deep learning-based approaches, a simple task prediction loss (i.e., categorical cross entropy for the classification task and mean squared error for the regression task) is used to learn the model. In machine learning, contrastive-based loss functions are developed to learn more discriminative feature space. In a deep learning-based model, extracting more discriminative feature space leads to performance improvement for the task prediction module. RESULTS In this paper, we have used multimodal knowledge as input and proposed an attention-based fusion technique to combine this knowledge. Also, we investigate how utilizing contrastive loss function along the task prediction loss could help the approach to learn a more powerful model. Four contrastive loss functions are considered: (1) max-margin contrastive loss function, (2) triplet loss function, (3) Multi-class N-pair Loss Objective, and (4) NT-Xent loss function. The proposed model is evaluated using four well-known datasets: Wang et al. dataset, Luo's dataset, Davis, and KIBA datasets. CONCLUSIONS Accordingly, after reviewing the state-of-the-art methods, we developed a multimodal feature extraction network by combining protein sequences and drug molecules, along with protein-protein interaction networks and drug-drug interaction networks. The results show it performs significantly better than the comparable state-of-the-art approaches.
Collapse
Affiliation(s)
- Alireza Dehghan
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish, 1417614411, Iran
| | - Karim Abbasi
- Laboratory of System Biology, Bioinformatics and Artificial Intelligence in Medicine (LBB&AI), Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, 1417614411, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 4513766731, Iran.
| | - Hossein Banadkuki
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, 1417614411, Iran
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, 1417614411, Iran.
| |
Collapse
|
16
|
Zhou S, Li Y, Wu W, Li L. scMMT: a multi-use deep learning approach for cell annotation, protein prediction and embedding in single-cell RNA-seq data. Brief Bioinform 2024; 25:bbad523. [PMID: 38300515 PMCID: PMC10833085 DOI: 10.1093/bib/bbad523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 12/19/2023] [Indexed: 02/02/2024] Open
Abstract
Accurate cell type annotation in single-cell RNA-sequencing data is essential for advancing biological and medical research, particularly in understanding disease progression and tumor microenvironments. However, existing methods are constrained by single feature extraction approaches, lack of adaptability to immune cell types with similar molecular profiles but distinct functions and a failure to account for the impact of cell label noise on model accuracy, all of which compromise the precision of annotation. To address these challenges, we developed a supervised approach called scMMT. We proposed a novel feature extraction technique to uncover more valuable information. Additionally, we constructed a multi-task learning framework based on the GradNorm method to enhance the recognition of challenging immune cells and reduce the impact of label noise by facilitating mutual reinforcement between cell type annotation and protein prediction tasks. Furthermore, we introduced logarithmic weighting and label smoothing mechanisms to enhance the recognition ability of rare cell types and prevent model overconfidence. Through comprehensive evaluations on multiple public datasets, scMMT has demonstrated state-of-the-art performance in various aspects including cell type annotation, rare cell identification, dropout and label noise resistance, protein expression prediction and low-dimensional embedding representation.
Collapse
Affiliation(s)
- Songqi Zhou
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| | - Yang Li
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
- Chongqing Research Institute of Big Data, Peking University, Chongqing, China
| | - Wenyuan Wu
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| | - Li Li
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| |
Collapse
|
17
|
Zhao C, Xu Z, Wang X, Tao S, MacDonald WA, He K, Poholek AC, Chen K, Huang H, Chen W. Innovative super-resolution in spatial transcriptomics: a transformer model exploiting histology images and spatial gene expression. Brief Bioinform 2024; 25:bbae052. [PMID: 38436557 PMCID: PMC10939304 DOI: 10.1093/bib/bbae052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 01/26/2024] [Accepted: 01/27/2024] [Indexed: 03/05/2024] Open
Abstract
Spatial transcriptomics technologies have shed light on the complexities of tissue structures by accurately mapping spatial microenvironments. Nonetheless, a myriad of methods, especially those utilized in platforms like Visium, often relinquish spatial details owing to intrinsic resolution limitations. In response, we introduce TransformerST, an innovative, unsupervised model anchored in the Transformer architecture, which operates independently of references, thereby ensuring cost-efficiency by circumventing the need for single-cell RNA sequencing. TransformerST not only elevates Visium data from a multicellular level to a single-cell granularity but also showcases adaptability across diverse spatial transcriptomics platforms. By employing a vision transformer-based encoder, it discerns latent image-gene expression co-representations and is further enhanced by spatial correlations, derived from an adaptive graph Transformer module. The sophisticated cross-scale graph network, utilized in super-resolution, significantly boosts the model's accuracy, unveiling complex structure-functional relationships within histology images. Empirical evaluations validate its adeptness in revealing tissue subtleties at the single-cell scale. Crucially, TransformerST adeptly navigates through image-gene co-representation, maximizing the synergistic utility of gene expression and histology images, thereby emerging as a pioneering tool in spatial transcriptomics. It not only enhances resolution to a single-cell level but also introduces a novel approach that optimally utilizes histology images alongside gene expression, providing a refined lens for investigating spatial transcriptomics.
Collapse
Affiliation(s)
- Chongyue Zhao
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, 15224, Pennsylvania, USA
| | - Zhongli Xu
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, 15224, Pennsylvania, USA
- School of Medicine, Tsinghua University, Beijing, 100084, Beijing, China
| | - Xinjun Wang
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, 10065, New York, USA
| | - Shiyue Tao
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, 15224, Pennsylvania, USA
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, 15261, Pennsylvania, USA
| | - William A MacDonald
- Health Sciences Sequencing Core at UPMC Children’s Hospital of Pittsburgh, Department of Pediatrics , University of Pittsburgh, Pittsburgh, 15224, Pennsylvania, USA
| | - Kun He
- Division of Pediatric Rheumatology, Department of Pediatrics , University of Pittsburgh, Pittsburgh, 15224, Pennsylvania, USA
| | - Amanda C Poholek
- Division of Pediatric Rheumatology, Department of Pediatrics , University of Pittsburgh, Pittsburgh, 15224, Pennsylvania, USA
- Department of Immunology , University of Pittsburgh, Pittsburgh, 15224, Pennsylvania, USA
- Health Sciences Sequencing Core at UPMC Children’s Hospital of Pittsburgh, Department of Pediatrics , University of Pittsburgh, Pittsburgh, 15224, Pennsylvania, USA
| | - Kong Chen
- Department of Medicine, University of Pittsburgh, Pittsburgh, 15213, Pennsylvania, USA
| | - Heng Huang
- Department of Computer Science, University of Maryland, College Park, 20742, Maryland, USA
| | - Wei Chen
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, 15224, Pennsylvania, USA
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, 15261, Pennsylvania, USA
| |
Collapse
|
18
|
Avram C, Gligor A, Roman D, Soylu A, Nyulas V, Avram L. Machine learning based assessment of preclinical health questionnaires. Int J Med Inform 2023; 180:105248. [PMID: 37866276 DOI: 10.1016/j.ijmedinf.2023.105248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/04/2023] [Accepted: 10/08/2023] [Indexed: 10/24/2023]
Abstract
BACKGROUND Within modern health systems, the possibility of accessing a large amount and a variety of data related to patients' health has increased significantly over the years. The source of this data could be mobile and wearable electronic systems used in everyday life, and specialized medical devices. In this study we aim to investigate the use of modern Machine Learning (ML) techniques for preclinical health assessment based on data collected from questionnaires filled out by patients. METHOD To identify the health conditions of pregnant women, we developed a questionnaire that was distributed in three maternity hospitals in the Mureș County, Romania. In this work we proposed and developed an ML model for pattern detection in common risk assessment based on data extracted from questionnaires. RESULTS Out of the 1278 women who answered the questionnaire, 381 smoked before pregnancy and only 216 quit smoking during the period in which they became pregnant. The performance of the model indicates the feasibility of the solution, with an accuracy of 98 % confirmed for the considered case study. CONCLUSION The proposed solution offers a simple and efficient way to digitize questionnaire data and to analyze the data through a reduced computational effort, both in terms of memory and computing power used.
Collapse
Affiliation(s)
- Calin Avram
- George Emil Palade University of Medicine, Pharmacy, Science and Technology of Targu Mures, Romania.
| | - Adrian Gligor
- George Emil Palade University of Medicine, Pharmacy, Science and Technology of Targu Mures, Romania.
| | - Dumitru Roman
- SINTEF AS, Norway; OsloMet - Oslo Metropolitan University, Norway.
| | - Ahmet Soylu
- OsloMet - Oslo Metropolitan University, Norway.
| | - Victoria Nyulas
- George Emil Palade University of Medicine, Pharmacy, Science and Technology of Targu Mures, Romania.
| | - Laura Avram
- "Dimitrie Cantemir" University of Târgu-Mureș, Romania.
| |
Collapse
|
19
|
Pan S, Jiang X, Zhang K. WSGMB: weight signed graph neural network for microbial biomarker identification. Brief Bioinform 2023; 25:bbad448. [PMID: 38084923 PMCID: PMC10714318 DOI: 10.1093/bib/bbad448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 11/07/2023] [Accepted: 11/14/2023] [Indexed: 12/18/2023] Open
Abstract
The stability of the gut microenvironment is inextricably linked to human health, with the onset of many diseases accompanied by dysbiosis of the gut microbiota. It has been reported that there are differences in the microbial community composition between patients and healthy individuals, and many microbes are considered potential biomarkers. Accurately identifying these biomarkers can lead to more precise and reliable clinical decision-making. To improve the accuracy of microbial biomarker identification, this study introduces WSGMB, a computational framework that uses the relative abundance of microbial taxa and health status as inputs. This method has two main contributions: (1) viewing the microbial co-occurrence network as a weighted signed graph and applying graph convolutional neural network techniques for graph classification; (2) designing a new architecture to compute the role transitions of each microbial taxon between health and disease networks, thereby identifying disease-related microbial biomarkers. The weighted signed graph neural network enhances the quality of graph embeddings; quantifying the importance of microbes in different co-occurrence networks better identifies those microbes critical to health. Microbes are ranked according to their importance change scores, and when this score exceeds a set threshold, the microbe is considered a biomarker. This framework's identification performance is validated by comparing the biomarkers identified by WSGMB with actual microbial biomarkers associated with specific diseases from public literature databases. The study tests the proposed computational framework using actual microbial community data from colorectal cancer and Crohn's disease samples. It compares it with the most advanced microbial biomarker identification methods. The results show that the WSGMB method outperforms similar approaches in the accuracy of microbial biomarker identification.
Collapse
Affiliation(s)
- Shuheng Pan
- Institute of Data and Information, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518005, China
| | - Xinyi Jiang
- Institute of Data and Information, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518005, China
| | - Kai Zhang
- Institute of Data and Information, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518005, China
| |
Collapse
|
20
|
Yin Q, Chen L. CellTICS: an explainable neural network for cell-type identification and interpretation based on single-cell RNA-seq data. Brief Bioinform 2023; 25:bbad449. [PMID: 38061196 PMCID: PMC10703497 DOI: 10.1093/bib/bbad449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 10/30/2023] [Accepted: 11/14/2023] [Indexed: 12/18/2023] Open
Abstract
Identifying cell types is crucial for understanding the functional units of an organism. Machine learning has shown promising performance in identifying cell types, but many existing methods lack biological significance due to poor interpretability. However, it is of the utmost importance to understand what makes cells share the same function and form a specific cell type, motivating us to propose a biologically interpretable method. CellTICS prioritizes marker genes with cell-type-specific expression, using a hierarchy of biological pathways for neural network construction, and applying a multi-predictive-layer strategy to predict cell and sub-cell types. CellTICS usually outperforms existing methods in prediction accuracy. Moreover, CellTICS can reveal pathways that define a cell type or a cell type under specific physiological conditions, such as disease or aging. The nonlinear nature of neural networks enables us to identify many novel pathways. Interestingly, some of the pathways identified by CellTICS exhibit differential expression "variability" rather than differential expression across cell types, indicating that expression stochasticity within a pathway could be an important feature characteristic of a cell type. Overall, CellTICS provides a biologically interpretable method for identifying and characterizing cell types, shedding light on the underlying pathways that define cellular heterogeneity and its role in organismal function. CellTICS is available at https://github.com/qyyin0516/CellTICS.
Collapse
Affiliation(s)
- Qingyang Yin
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| | - Liang Chen
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| |
Collapse
|
21
|
Su Y, Hu Z, Wang F, Bin Y, Zheng C, Li H, Chen H, Zeng X. AMGDTI: drug-target interaction prediction based on adaptive meta-graph learning in heterogeneous network. Brief Bioinform 2023; 25:bbad474. [PMID: 38145949 PMCID: PMC10749791 DOI: 10.1093/bib/bbad474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 11/10/2023] [Accepted: 11/30/2023] [Indexed: 12/27/2023] Open
Abstract
Prediction of drug-target interactions (DTIs) is essential in medicine field, since it benefits the identification of molecular structures potentially interacting with drugs and facilitates the discovery and reposition of drugs. Recently, much attention has been attracted to network representation learning to learn rich information from heterogeneous data. Although network representation learning algorithms have achieved success in predicting DTI, several manually designed meta-graphs limit the capability of extracting complex semantic information. To address the problem, we introduce an adaptive meta-graph-based method, termed AMGDTI, for DTI prediction. In the proposed AMGDTI, the semantic information is automatically aggregated from a heterogeneous network by training an adaptive meta-graph, thereby achieving efficient information integration without requiring domain knowledge. The effectiveness of the proposed AMGDTI is verified on two benchmark datasets. Experimental results demonstrate that the AMGDTI method overall outperforms eight state-of-the-art methods in predicting DTI and achieves the accurate identification of novel DTIs. It is also verified that the adaptive meta-graph exhibits flexibility and effectively captures complex fine-grained semantic information, enabling the learning of intricate heterogeneous network topology and the inference of potential drug-target relationship.
Collapse
Affiliation(s)
- Yansen Su
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, China
| | - Zhiyang Hu
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, China
| | - Fei Wang
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, China
| | - Yannan Bin
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, China
| | - Chunhou Zheng
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, China
| | - Haitao Li
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, China
| | - Haowen Chen
- College of Computer Science and Electronic Engineering, Hunan University, Hunan, 410082, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Hunan, 410082, China
| |
Collapse
|
22
|
Zhang J, Xie M. Graph regularized non-negative matrix factorization with [Formula: see text] norm regularization terms for drug-target interactions prediction. BMC Bioinformatics 2023; 24:375. [PMID: 37789278 PMCID: PMC10548602 DOI: 10.1186/s12859-023-05496-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 09/22/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND Identifying drug-target interactions (DTIs) plays a key role in drug development. Traditional wet experiments to identify DTIs are costly and time consuming. Effective computational methods to predict DTIs are useful to speed up the process of drug discovery. A variety of non-negativity matrix factorization based methods are proposed to predict DTIs, but most of them overlooked the sparsity of feature matrices and the convergence of adopted matrix factorization algorithms, therefore their performances can be further improved. RESULTS In order to predict DTIs more accurately, we propose a novel method iPALM-DLMF. iPALM-DLMF models DTIs prediction as a problem of non-negative matrix factorization with graph dual regularization terms and [Formula: see text] norm regularization terms. The graph dual regularization terms are used to integrate the information from the drug similarity matrix and the target similarity matrix, and [Formula: see text] norm regularization terms are used to ensure the sparsity of the feature matrices obtained by non-negative matrix factorization. To solve the model, iPALM-DLMF adopts non-negative double singular value decomposition to initialize the nonnegative matrix factorization, and an inertial Proximal Alternating Linearized Minimization iterating process, which has been proved to converge to a KKT point, to obtain the final result of the matrix factorization. Extensive experimental results show that iPALM-DLMF has better performance than other state-of-the-art methods. In case studies, in 50 highest-scoring proteins targeted by the drug gabapentin predicted by iPALM-DLMF, 46 have been validated, and in 50 highest-scoring drugs targeting prostaglandin-endoperoxide synthase 2 predicted by iPALM-DLMF, 47 have been validated.
Collapse
Affiliation(s)
- Junjun Zhang
- Key Laboratory of Computing and Stochastic Mathematics(LCSM) (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha, 410081 China
| | - Minzhu Xie
- Key Laboratory of Computing and Stochastic Mathematics(LCSM) (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha, 410081 China
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081 China
| |
Collapse
|
23
|
Qian Y, Li X, Wu J, Zhang Q. MCL-DTI: using drug multimodal information and bi-directional cross-attention learning method for predicting drug-target interaction. BMC Bioinformatics 2023; 24:323. [PMID: 37633938 PMCID: PMC10463755 DOI: 10.1186/s12859-023-05447-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 08/15/2023] [Indexed: 08/28/2023] Open
Abstract
BACKGROUND Prediction of drug-target interaction (DTI) is an essential step for drug discovery and drug reposition. Traditional methods are mostly time-consuming and labor-intensive, and deep learning-based methods address these limitations and are applied to engineering. Most of the current deep learning methods employ representation learning of unimodal information such as SMILES sequences, molecular graphs, or molecular images of drugs. In addition, most methods focus on feature extraction from drug and target alone without fusion learning from drug-target interacting parties, which may lead to insufficient feature representation. MOTIVATION In order to capture more comprehensive drug features, we utilize both molecular image and chemical features of drugs. The image of the drug mainly has the structural information and spatial features of the drug, while the chemical information includes its functions and properties, which can complement each other, making drug representation more effective and complete. Meanwhile, to enhance the interactive feature learning of drug and target, we introduce a bidirectional multi-head attention mechanism to improve the performance of DTI. RESULTS To enhance feature learning between drugs and targets, we propose a novel model based on deep learning for DTI task called MCL-DTI which uses multimodal information of drug and learn the representation of drug-target interaction for drug-target prediction. In order to further explore a more comprehensive representation of drug features, this paper first exploits two multimodal information of drugs, molecular image and chemical text, to represent the drug. We also introduce to use bi-rectional multi-head corss attention (MCA) method to learn the interrelationships between drugs and targets. Thus, we build two decoders, which include an multi-head self attention (MSA) block and an MCA block, for cross-information learning. We use a decoder for the drug and target separately to obtain the interaction feature maps. Finally, we feed these feature maps generated by decoders into a fusion block for feature extraction and output the prediction results. CONCLUSIONS MCL-DTI achieves the best results in all the three datasets: Human, C. elegans and Davis, including the balanced datasets and an unbalanced dataset. The results on the drug-drug interaction (DDI) task show that MCL-DTI has a strong generalization capability and can be easily applied to other tasks.
Collapse
Affiliation(s)
- Ying Qian
- Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062 China
| | - Xinyi Li
- Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062 China
| | - Jian Wu
- Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062 China
| | - Qian Zhang
- Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062 China
| |
Collapse
|
24
|
Rafiei F, Zeraati H, Abbasi K, Ghasemi JB, Parsaeian M, Masoudi-Nejad A. DeepTraSynergy: drug combinations using multimodal deep learning with transformers. Bioinformatics 2023; 39:btad438. [PMID: 37467066 PMCID: PMC10397534 DOI: 10.1093/bioinformatics/btad438] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 06/27/2023] [Accepted: 07/17/2023] [Indexed: 07/21/2023] Open
Abstract
MOTIVATION Screening bioactive compounds in cancer cell lines receive more attention. Multidisciplinary drugs or drug combinations have a more effective role in treatments and selectively inhibit the growth of cancer cells. RESULTS Hence, we propose a new deep learning-based approach for drug combination synergy prediction called DeepTraSynergy. Our proposed approach utilizes multimodal input including drug-target interaction, protein-protein interaction, and cell-target interaction to predict drug combination synergy. To learn the feature representation of drugs, we have utilized transformers. It is worth noting that our approach is a multitask approach that predicts three outputs including the drug-target interaction, its toxic effect, and drug combination synergy. In our approach, drug combination synergy is the main task and the two other ones are the auxiliary tasks that help the approach to learn a better model. In the proposed approach three loss functions are defined: synergy loss, toxic loss, and drug-protein interaction loss. The last two loss functions are designed as auxiliary losses to help learn a better solution. DeepTraSynergy outperforms the classic and state-of-the-art models in predicting synergistic drug combinations on the two latest drug combination datasets. The DeepTraSynergy algorithm achieves accuracy values of 0.7715 and 0.8052 (an improvement over other approaches) on the DrugCombDB and Oncology-Screen datasets, respectively. Also, we evaluate the contribution of each component of DeepTraSynergy to show its effectiveness in the proposed method. The introduction of the relation between proteins (PPI networks) and drug-protein interaction significantly improves the prediction of synergistic drug combinations. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/fatemeh-rafiei/DeepTraSynergy.
Collapse
Affiliation(s)
- Fatemeh Rafiei
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran 1417613151, Iran
| | - Hojjat Zeraati
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran 1417613151, Iran
| | - Karim Abbasi
- Laboratory of System Biology, Bioinformatics & Artificial Intelligent in Medicine (LBB&AI), Faculty of Mathematics and Computer Science, Kharazmi University, Tehran 1571914911, Iran
| | - Jahan B Ghasemi
- Chemistry Department, Faculty of Chemistry, School of Sciences, University of Tehran, Tehran 1417614411, Iran
| | - Mahboubeh Parsaeian
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran 1417613151, Iran
- Department of Epidemiology & Biostatistics, School of Public Health, Imperial College London, London W21PG, United Kingdom
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| |
Collapse
|