1
|
Chen H, Lu D, Xiao Z, Li S, Zhang W, Luan X, Zhang W, Zheng G. Comprehensive applications of the artificial intelligence technology in new drug research and development. Health Inf Sci Syst 2024; 12:41. [PMID: 39130617 PMCID: PMC11310389 DOI: 10.1007/s13755-024-00300-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 07/27/2024] [Indexed: 08/13/2024] Open
Abstract
Purpose Target-based strategy is a prevalent means of drug research and development (R&D), since targets provide effector molecules of drug action and offer the foundation of pharmacological investigation. Recently, the artificial intelligence (AI) technology has been utilized in various stages of drug R&D, where AI-assisted experimental methods show higher efficiency than sole experimental ones. It is a critical need to give a comprehensive review of AI applications in drug R &D for biopharmaceutical field. Methods Relevant literatures about AI-assisted drug R&D were collected from the public databases (Including Google Scholar, Web of Science, PubMed, IEEE Xplore Digital Library, Springer, and ScienceDirect) through a keyword searching strategy with the following terms [("Artificial Intelligence" OR "Knowledge Graph" OR "Machine Learning") AND ("Drug Target Identification" OR "New Drug Development")]. Results In this review, we first introduced common strategies and novel trends of drug R&D, followed by characteristic description of AI algorithms widely used in drug R&D. Subsequently, we depicted detailed applications of AI algorithms in target identification, lead compound identification and optimization, drug repurposing, and drug analytical platform construction. Finally, we discussed the challenges and prospects of AI-assisted methods for drug discovery. Conclusion Collectively, this review provides comprehensive overview of AI applications in drug R&D and presents future perspectives for biopharmaceutical field, which may promote the development of drug industry.
Collapse
Affiliation(s)
- Hongyu Chen
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Dong Lu
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Ziyi Xiao
- Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| | - Shensuo Li
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Wen Zhang
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Xin Luan
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Weidong Zhang
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Guangyong Zheng
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| |
Collapse
|
2
|
Xia X, Zhu C, Zhong F, Liu L. TransCDR: a deep learning model for enhancing the generalizability of drug activity prediction through transfer learning and multimodal data fusion. BMC Biol 2024; 22:227. [PMID: 39385185 PMCID: PMC11462810 DOI: 10.1186/s12915-024-02023-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 09/30/2024] [Indexed: 10/11/2024] Open
Abstract
BACKGROUND Accurate and robust drug response prediction is of utmost importance in precision medicine. Although many models have been developed to utilize the representations of drugs and cancer cell lines for predicting cancer drug responses (CDR), their performances can be improved by addressing issues such as insufficient data modality, suboptimal fusion algorithms, and poor generalizability for novel drugs or cell lines. RESULTS We introduce TransCDR, which uses transfer learning to learn drug representations and fuses multi-modality features of drugs and cell lines by a self-attention mechanism, to predict the IC50 values or sensitive states of drugs on cell lines. We are the first to systematically evaluate the generalization of the CDR prediction model to novel (i.e., never-before-seen) compound scaffolds and cell line clusters. TransCDR shows better generalizability than 8 state-of-the-art models. TransCDR outperforms its 5 variants that train drug encoders (i.e., RNN and AttentiveFP) from scratch under various scenarios. The most critical contributors among multiple drug notations and omics profiles are Extended Connectivity Fingerprint and genetic mutation. Additionally, the attention-based fusion module further enhances the predictive performance of TransCDR. TransCDR, trained on the GDSC dataset, demonstrates strong predictive performance on the external testing set CCLE. It is also utilized to predict missing CDRs on GDSC. Moreover, we investigate the biological mechanisms underlying drug response by classifying 7675 patients from TCGA into drug-sensitive or drug-resistant groups, followed by a Gene Set Enrichment Analysis. CONCLUSIONS TransCDR emerges as a potent tool with significant potential in drug response prediction.
Collapse
Affiliation(s)
- Xiaoqiong Xia
- Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Chaoyu Zhu
- Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
| | - Fan Zhong
- Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China.
| | - Lei Liu
- Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China.
- Shanghai Institute of Stem Cell Research and Clinical Translation, Shanghai, 200120, China.
| |
Collapse
|
3
|
Xie J, Song Y, Zheng H, Luo S, Chen Y, Zhang C, Yu R, Tong M. PathMethy: an interpretable AI framework for cancer origin tracing based on DNA methylation. Brief Bioinform 2024; 25:bbae497. [PMID: 39391931 PMCID: PMC11467402 DOI: 10.1093/bib/bbae497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 09/09/2024] [Accepted: 10/02/2024] [Indexed: 10/12/2024] Open
Abstract
Despite advanced diagnostics, 3%-5% of cases remain classified as cancer of unknown primary (CUP). DNA methylation, an important epigenetic feature, is essential for determining the origin of metastatic tumors. We presented PathMethy, a novel Transformer model integrated with functional categories and crosstalk of pathways, to accurately trace the origin of tumors in CUP samples based on DNA methylation. PathMethy outperformed seven competing methods in F1-score across nine cancer datasets and predicted accurately the molecular subtypes within nine primary tumor types. It not only excelled at tracing the origins of both primary and metastatic tumors but also demonstrated a high degree of agreement with previously diagnosed sites in cases of CUP. PathMethy provided biological insights by highlighting key pathways, functional categories, and their interactions. Using functional categories of pathways, we gained a global understanding of biological processes. For broader access, a user-friendly web server for researchers and clinicians is available at https://cup.pathmethy.com.
Collapse
Affiliation(s)
- Jiajing Xie
- National Institute for Data Science in Health and Medicine, Xiamen University, No. 4221-121 South Xiang'an Road, Xiamen, Fujian 361102, China
| | - Yuhang Song
- School of Informatics, Xiamen University, No. 4221-121 South Xiang'an Road, Xiamen, Fujian 361005, China
| | - Hailong Zheng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, No. 1023, South Shatai Road, Baiyun District, Guangzhou, Guangdong, 510515, China
| | - Shijie Luo
- National Institute for Data Science in Health and Medicine, Xiamen University, No. 4221-121 South Xiang'an Road, Xiamen, Fujian 361102, China
| | - Ying Chen
- School of Informatics, Xiamen University, No. 4221-121 South Xiang'an Road, Xiamen, Fujian 361005, China
| | - Chen Zhang
- National Institute for Data Science in Health and Medicine, Xiamen University, No. 4221-121 South Xiang'an Road, Xiamen, Fujian 361102, China
| | - Rongshan Yu
- National Institute for Data Science in Health and Medicine, Xiamen University, No. 4221-121 South Xiang'an Road, Xiamen, Fujian 361102, China
- School of Informatics, Xiamen University, No. 4221-121 South Xiang'an Road, Xiamen, Fujian 361005, China
| | - Mengsha Tong
- National Institute for Data Science in Health and Medicine, Xiamen University, No. 4221-121 South Xiang'an Road, Xiamen, Fujian 361102, China
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, No. 4221-121 South Xiang'an Road, Xiamen, Fujian 361102, China
| |
Collapse
|
4
|
Pak M, Bang D, Sung I, Kim S, Lee S. DGDRP: drug-specific gene selection for drug response prediction via re-ranking through propagating and learning biological network. Front Genet 2024; 15:1441558. [PMID: 39371421 PMCID: PMC11450864 DOI: 10.3389/fgene.2024.1441558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 09/03/2024] [Indexed: 10/08/2024] Open
Abstract
Introduction: Drug response prediction, especially in terms of cell viability prediction, is a well-studied research problem with significant implications for personalized medicine. It enables the identification of the most effective drugs based on individual genetic profiles, aids in selecting potential drug candidates, and helps identify biomarkers that predict drug efficacy and toxicity.A deeper investigation on drug response prediction reveals that drugs exert their effects by targeting specific proteins, which in turn perturb related genes in cascading ways. This perturbation affects cellular pathways and regulatory networks, ultimately influencing the cellular response to the drug. Identifying which genes are perturbed and how they interact can provide critical insights into the mechanisms of drug action. Hence, the problem of predicting drug response can be framed as a dual problem involving both the prediction of drug efficacy and the selection of drug-specific genes. Identifying these drug-specific genes (biomarkers) is crucial because they serve as indicators of how the drug will affect the biological system, thereby facilitating both drug response prediction and biomarker discovery.Methods: In this study, we propose DGDRP (Drug-specific Gene selection for Drug Response Prediction), a graph neural network (GNN)-based model that uses a novel rank-and-re-rank process for drug-specific gene selection. DGDRP first ranks genes using a pathway knowledge-enhanced network propagation algorithm based on drug target information, ensuring biological relevance. It then re-ranks genes based on the similarity between gene and drug target embeddings learned from the GNN, incorporating semantic relationships. Thus, our model adaptively learns to select drug mechanism-associated genes that contribute to drug response prediction. This integrated approach not only improves drug response predictions compared to other gene selection methods but also allows for effective biomarker discovery.Discussion: As a result, our approach demonstrates improved drug response predictions compared to other gene selection methods and demonstrates comparability with state-of-the-art deep learning models. Case studies further support our method by showing alignment of selected gene sets with the mechanisms of action of input drugs.Conclusion: Overall, DGDRP represents a deep learning based re-ranking strategy, offering a robust gene selection framework for more accurate drug response prediction. The source code for DGDRP can be found at: https://github.com/minwoopak/heteronet.
Collapse
Affiliation(s)
- Minwoo Pak
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Aigendrug Co., Ltd., Seoul, Republic of Korea
| | - Inyoung Sung
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea
| | - Sunho Lee
- Aigendrug Co., Ltd., Seoul, Republic of Korea
| |
Collapse
|
5
|
Kamble P, Nagar PR, Bhakhar KA, Garg P, Sobhia ME, Naidu S, Bharatam PV. Cancer pharmacoinformatics: Databases and analytical tools. Funct Integr Genomics 2024; 24:166. [PMID: 39294509 DOI: 10.1007/s10142-024-01445-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 08/26/2024] [Accepted: 09/03/2024] [Indexed: 09/20/2024]
Abstract
Cancer is a subject of extensive investigation, and the utilization of omics technology has resulted in the generation of substantial volumes of big data in cancer research. Numerous databases are being developed to manage and organize this data effectively. These databases encompass various domains such as genomics, transcriptomics, proteomics, metabolomics, immunology, and drug discovery. The application of computational tools into various core components of pharmaceutical sciences constitutes "Pharmacoinformatics", an emerging paradigm in rational drug discovery. The three major features of pharmacoinformatics include (i) Structure modelling of putative drugs and targets, (ii) Compilation of databases and analysis using statistical approaches, and (iii) Employing artificial intelligence/machine learning algorithms for the discovery of novel therapeutic molecules. The development, updating, and analysis of databases using statistical approaches play a pivotal role in pharmacoinformatics. Multiple software tools are associated with oncoinformatics research. This review catalogs the databases and computational tools related to cancer drug discovery and highlights their potential implications in the pharmacoinformatics of cancer.
Collapse
Affiliation(s)
- Pradnya Kamble
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Prinsa R Nagar
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Kaushikkumar A Bhakhar
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - M Elizabeth Sobhia
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Srivatsava Naidu
- Center of Biomedical Engineering, Indian Institute of Technology Ropar, Rupnagar, Punjab, India
| | - Prasad V Bharatam
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India.
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India.
| |
Collapse
|
6
|
Harris J, Yadalam PK, Anegundi RV, Arumuganainar D. Comparing Graph Sample and Aggregation (SAGE) and Graph Attention Networks in the Prediction of Drug-Gene Associations of Extended-Spectrum Beta-Lactamases in Periodontal Infections and Resistance. Cureus 2024; 16:e68082. [PMID: 39347119 PMCID: PMC11437384 DOI: 10.7759/cureus.68082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 08/29/2024] [Indexed: 10/01/2024] Open
Abstract
INTRODUCTION Gram-negative bacteria exhibit more antibiotic resistance than gram-positive bacteria due to their cell wall structure and composition differences. Porins, or protein channels in these bacteria, can allow small, hydrophilic antibiotics to diffuse, affecting their susceptibility. Mutations in porin protein genes can also impair antibiotic entry. Predicting drug-gene associations of extended-spectrum beta-lactamases (ESBLs) is crucial as they confer resistance to beta-lactam antibiotics, challenging the treatment of infections. This aids clinicians in selecting suitable treatments, optimizing drug usage, enhancing patient outcomes, and controlling antibiotic resistance in healthcare settings. Graph-based neural networks can predict drug-gene associations in periodontal infections and resistance. The aim of the study was to predict drug-gene associations of ESBLs in periodontal infections and resistance. METHODS The study focuses on analyzing drug-gene associations using probes and drugs. The data was converted into graph language, assigning nodes and edges for drugs and genes. Graph neural networks (GNNs) and similar algorithms were implemented using Google Colab and Python. Cytoscape and CytoHubba are open-source software platforms used for network analysis and visualization. GNNs were used for tasks like node classification, link prediction, and graph-level prediction. Three graph-based models were used: graph convolutional network (GCN), Graph SAGE, and graph attention network (GAT). Each model was trained for 200 epochs using the Adam optimizer with a learning rate of 0.01 and a weight decay of 5e-4. RESULTS The drug-gene association network has 57 nodes, 79 edges, and a 2.730 characteristic path length. Its structure, organization, and connectivity are analyzed using the GCN and Graph SAGE, which show high accuracy, precision, recall, and an F1-score of 0.94. GAT's performance metrics are lower, with an accuracy of 0.68, precision of 0.47, recall of 0.68, and F1-score of 0.56, suggesting that it may not be as effective in capturing drug-gene relationships. CONCLUSION Compared to ESBLs, both GCN and Graph SAGE demonstrate excellent performance with accuracy, precision, recall, and an F1-score of 0.94. These results indicate that GCN and Graph SAGE are highly effective in predicting drug-gene associations related to ESBLs. GCN and Graph SAGE outperform GAT in predicting drug-gene associations for ESBLs. Improvements include data augmentation, regularization, and cross-validation. Ethical considerations, fairness, and open-source implementations are crucial for future research in precision periodontal treatment.
Collapse
Affiliation(s)
- Johnisha Harris
- Periodontics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, IND
| | - Pradeep Kumar Yadalam
- Periodontics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, IND
| | - Raghavendra Vamsi Anegundi
- Periodontics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, IND
| | - Deepavalli Arumuganainar
- Periodontics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, IND
| |
Collapse
|
7
|
Bang D, Koo B, Kim S. Transfer learning of condition-specific perturbation in gene interactions improves drug response prediction. Bioinformatics 2024; 40:i130-i139. [PMID: 38940127 PMCID: PMC11256952 DOI: 10.1093/bioinformatics/btae249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY Drug response is conventionally measured at the cell level, often quantified by metrics like IC50. However, to gain a deeper understanding of drug response, cellular outcomes need to be understood in terms of pathway perturbation. This perspective leads us to recognize a challenge posed by the gap between two widely used large-scale databases, LINCS L1000 and GDSC, measuring drug response at different levels-L1000 captures information at the gene expression level, while GDSC operates at the cell line level. Our study aims to bridge this gap by integrating the two databases through transfer learning, focusing on condition-specific perturbations in gene interactions from L1000 to interpret drug response integrating both gene and cell levels in GDSC. This transfer learning strategy involves pretraining on the transcriptomic-level L1000 dataset, with parameter-frozen fine-tuning to cell line-level drug response. Our novel condition-specific gene-gene attention (CSG2A) mechanism dynamically learns gene interactions specific to input conditions, guided by both data and biological network priors. The CSG2A network, equipped with transfer learning strategy, achieves state-of-the-art performance in cell line-level drug response prediction. In two case studies, well-known mechanisms of drugs are well represented in both the learned gene-gene attention and the predicted transcriptomic profiles. This alignment supports the modeling power in terms of interpretability and biological relevance. Furthermore, our model's unique capacity to capture drug response in terms of both pathway perturbation and cell viability extends predictions to the patient level using TCGA data, demonstrating its expressive power obtained from both gene and cell levels. AVAILABILITY AND IMPLEMENTATION The source code for the CSG2A network is available at https://github.com/eugenebang/CSG2A.
Collapse
Affiliation(s)
- Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- AIGENDRUG Co., Ltd., Seoul, 08758, Republic of Korea
| | - Bonil Koo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- AIGENDRUG Co., Ltd., Seoul, 08758, Republic of Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- AIGENDRUG Co., Ltd., Seoul, 08758, Republic of Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, 08826, Republic of Korea
| |
Collapse
|
8
|
Xie J, Chen Y, Luo S, Yang W, Lin Y, Wang L, Ding X, Tong M, Yu R. Tracing unknown tumor origins with a biological-pathway-based transformer model. CELL REPORTS METHODS 2024; 4:100797. [PMID: 38889685 PMCID: PMC11228371 DOI: 10.1016/j.crmeth.2024.100797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/01/2024] [Accepted: 05/21/2024] [Indexed: 06/20/2024]
Abstract
Cancer of unknown primary (CUP) represents metastatic cancer where the primary site remains unidentified despite standard diagnostic procedures. To determine the tumor origin in such cases, we developed BPformer, a deep learning method integrating the transformer model with prior knowledge of biological pathways. Trained on transcriptomes from 10,410 primary tumors across 32 cancer types, BPformer achieved remarkable accuracy rates of 94%, 92%, and 89% in primary tumors and primary and metastatic sites of metastatic tumors, respectively, surpassing existing methods. Additionally, BPformer was validated in a retrospective study, demonstrating consistency with tumor sites diagnosed through immunohistochemistry and histopathology. Furthermore, BPformer was able to rank pathways based on their contribution to tumor origin identification, which helped to classify oncogenic signaling pathways into those that are highly conservative among different cancers versus those that are highly variable depending on their origins.
Collapse
Affiliation(s)
- Jiajing Xie
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Ying Chen
- School of Informatics, Xiamen University, Xiamen, Fujian 361005, China
| | - Shijie Luo
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Wenxian Yang
- Aginome Scientific, Xiamen, Fujian 361005, China
| | - Yuxiang Lin
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Liansheng Wang
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China; School of Informatics, Xiamen University, Xiamen, Fujian 361005, China
| | - Xin Ding
- Department of Pathology, Zhongshan Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, Fujian 361004, China.
| | - Mengsha Tong
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China; State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China.
| | - Rongshan Yu
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China; School of Informatics, Xiamen University, Xiamen, Fujian 361005, China; Aginome Scientific, Xiamen, Fujian 361005, China.
| |
Collapse
|
9
|
Campana PA, Prasse P, Lienhard M, Thedinga K, Herwig R, Scheffer T. Cancer drug sensitivity estimation using modular deep Graph Neural Networks. NAR Genom Bioinform 2024; 6:lqae043. [PMID: 38680251 PMCID: PMC11055499 DOI: 10.1093/nargab/lqae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/01/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024] Open
Abstract
Computational drug sensitivity models have the potential to improve therapeutic outcomes by identifying targeted drugs components that are tailored to the transcriptomic profile of a given primary tumor. The SMILES representation of molecules that is used by state-of-the-art drug-sensitivity models is not conducive for neural networks to generalize to new drugs, in part because the distance between atoms does not generally correspond to the distance between their representation in the SMILES strings. Graph-attention networks, on the other hand, are high-capacity models that require large training-data volumes which are not available for drug-sensitivity estimation. We develop a modular drug-sensitivity graph-attentional neural network. The modular architecture allows us to separately pre-train the graph encoder and graph-attentional pooling layer on related tasks for which more data are available. We observe that this model outperforms reference models for the use cases of precision oncology and drug discovery; in particular, it is better able to predict the specific interaction between drug and cell line that is not explained by the general cytotoxicity of the drug and the overall survivability of the cell line. The complete source code is available at https://zenodo.org/doi/10.5281/zenodo.8020945. All experiments are based on the publicly available GDSC data.
Collapse
Affiliation(s)
- Pedro A Campana
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| | - Paul Prasse
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| | - Matthias Lienhard
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Kristina Thedinga
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Ralf Herwig
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Tobias Scheffer
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| |
Collapse
|
10
|
Nguyen T, Campbell A, Kumar A, Amponsah E, Fiterau M, Shahriyari L. Optimal fusion of genotype and drug embeddings in predicting cancer drug response. Brief Bioinform 2024; 25:bbae227. [PMID: 38754407 PMCID: PMC11097979 DOI: 10.1093/bib/bbae227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 04/14/2024] [Accepted: 04/25/2024] [Indexed: 05/18/2024] Open
Abstract
Predicting cancer drug response using both genomics and drug features has shown some success compared to using genomics features alone. However, there has been limited research done on how best to combine or fuse the two types of features. Using a visible neural network with two deep learning branches for genes and drug features as the base architecture, we experimented with different fusion functions and fusion points. Our experiments show that injecting multiplicative relationships between gene and drug latent features into the original concatenation-based architecture DrugCell significantly improved the overall predictive performance and outperformed other baseline models. We also show that different fusion methods respond differently to different fusion points, indicating that the relationship between drug features and different hierarchical biological level of gene features is optimally captured using different methods. Considering both predictive performance and runtime speed, tensor product partial is the best-performing fusion function to combine late-stage representations of drug and gene features to predict cancer drug response.
Collapse
Affiliation(s)
- Trang Nguyen
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Anthony Campbell
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Ankit Kumar
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Edwin Amponsah
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Madalina Fiterau
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Leili Shahriyari
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| |
Collapse
|
11
|
Narykov O, Zhu Y, Brettin T, Evrard YA, Partin A, Shukla M, Xia F, Clyde A, Vasanthakumari P, Doroshow JH, Stevens RL. Integration of Computational Docking into Anti-Cancer Drug Response Prediction Models. Cancers (Basel) 2023; 16:50. [PMID: 38201477 PMCID: PMC10777918 DOI: 10.3390/cancers16010050] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/12/2024] Open
Abstract
Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.
Collapse
Affiliation(s)
- Oleksandr Narykov
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Yitan Zhu
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Yvonne A. Evrard
- Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA;
| | - Alexander Partin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Maulik Shukla
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Austin Clyde
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| | - Priyanka Vasanthakumari
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - James H. Doroshow
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD 20892, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
12
|
Gogoshin G, Rodin AS. Graph Neural Networks in Cancer and Oncology Research: Emerging and Future Trends. Cancers (Basel) 2023; 15:5858. [PMID: 38136405 PMCID: PMC10742144 DOI: 10.3390/cancers15245858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/09/2023] [Accepted: 12/14/2023] [Indexed: 12/24/2023] Open
Abstract
Next-generation cancer and oncology research needs to take full advantage of the multimodal structured, or graph, information, with the graph data types ranging from molecular structures to spatially resolved imaging and digital pathology, biological networks, and knowledge graphs. Graph Neural Networks (GNNs) efficiently combine the graph structure representations with the high predictive performance of deep learning, especially on large multimodal datasets. In this review article, we survey the landscape of recent (2020-present) GNN applications in the context of cancer and oncology research, and delineate six currently predominant research areas. We then identify the most promising directions for future research. We compare GNNs with graphical models and "non-structured" deep learning, and devise guidelines for cancer and oncology researchers or physician-scientists, asking the question of whether they should adopt the GNN methodology in their research pipelines.
Collapse
Affiliation(s)
- Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, Beckman Research Institute, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Andrei S. Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| |
Collapse
|
13
|
Wang W, Yuan G, Wan S, Zheng Z, Liu D, Zhang H, Li J, Zhou Y, Wang X. A granularity-level information fusion strategy on hypergraph transformer for predicting synergistic effects of anticancer drugs. Brief Bioinform 2023; 25:bbad522. [PMID: 38243692 PMCID: PMC10796255 DOI: 10.1093/bib/bbad522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 11/08/2023] [Accepted: 12/19/2023] [Indexed: 01/21/2024] Open
Abstract
Combination therapy has exhibited substantial potential compared to monotherapy. However, due to the explosive growth in the number of cancer drugs, the screening of synergistic drug combinations has become both expensive and time-consuming. Synergistic drug combinations refer to the concurrent use of two or more drugs to enhance treatment efficacy. Currently, numerous computational methods have been developed to predict the synergistic effects of anticancer drugs. However, there has been insufficient exploration of how to mine drug and cell line data at different granularity levels for predicting synergistic anticancer drug combinations. Therefore, this study proposes a granularity-level information fusion strategy based on the hypergraph transformer, named HypertranSynergy, to predict synergistic effects of anticancer drugs. HypertranSynergy introduces synergistic connections between cancer cell lines and drug combinations using hypergraph. Then, the Coarse-grained Information Extraction (CIE) module merges the hypergraph with a transformer for node embeddings. In the CIE module, Contranorm is a normalization layer that mitigates over-smoothing, while Gaussian noise addresses local information gaps. Additionally, the Fine-grained Information Extraction (FIE) module assesses fine-grained information's impact on predictions by employing similarity-aware matrices from drug/cell line features. Both CIE and FIE modules are integrated into HypertranSynergy. In addition, HypertranSynergy achieved the AUC of 0.93${\pm }$0.01 and the AUPR of 0.69${\pm }$0.02 in 5-fold cross-validation of classification task, and the RMSE of 13.77${\pm }$0.07 and the PCC of 0.81${\pm }$0.02 in 5-fold cross-validation of regression task. These results are better than most of the state-of-the-art models.
Collapse
Affiliation(s)
- Wei Wang
- College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
- Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province 453007, China
| | - Gaolin Yuan
- College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
| | - Shitong Wan
- College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
| | - Ziwei Zheng
- College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
| | - Dong Liu
- College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
- Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province 453007, China
| | - Hongjun Zhang
- Hebi Instiute of Engineering and Technology, Henan Polytechnic University, 458030, China
| | - Juntao Li
- School of Mathematics and Information Science, Henan Normal University, 453007 Xinxiang, China
| | - Yun Zhou
- College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
- Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province 453007, China
| | - Xianfang Wang
- College of Computer Science and Technology Engineering, Henan Institute of Technology, 453000, China
| |
Collapse
|
14
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|
15
|
Liu X, Zhang W. A subcomponent-guided deep learning method for interpretable cancer drug response prediction. PLoS Comput Biol 2023; 19:e1011382. [PMID: 37603576 PMCID: PMC10470940 DOI: 10.1371/journal.pcbi.1011382] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/31/2023] [Accepted: 07/24/2023] [Indexed: 08/23/2023] Open
Abstract
Accurate prediction of cancer drug response (CDR) is a longstanding challenge in modern oncology that underpins personalized treatment. Current computational methods implement CDR prediction by modeling responses between entire drugs and cell lines, without the consideration that response outcomes may primarily attribute to a few finer-level 'subcomponents', such as privileged substructures of the drug or gene signatures of the cancer cell, thus producing predictions that are hard to explain. Herein, we present SubCDR, a subcomponent-guided deep learning method for interpretable CDR prediction, to recognize the most relevant subcomponents driving response outcomes. Technically, SubCDR is built upon a line of deep neural networks that enables a set of functional subcomponents to be extracted from each drug and cell line profile, and breaks the CDR prediction down to identifying pairwise interactions between subcomponents. Such a subcomponent interaction form can offer a traceable path to explicitly indicate which subcomponents contribute more to the response outcome. We verify the superiority of SubCDR over state-of-the-art CDR prediction methods through extensive computational experiments on the GDSC dataset. Crucially, we found many predicted cases that demonstrate the strength of SubCDR in finding the key subcomponents driving responses and exploiting these subcomponents to discover new therapeutic drugs. These results suggest that SubCDR will be highly useful for biomedical researchers, particularly in anti-cancer drug design.
Collapse
Affiliation(s)
- Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|