1
|
Li Y, Zhang X, Chen Z, Yang H, Liu Y, Wang H, Yan T, Xiang J, Wang B. Accurate prediction of drug-target interactions in Chinese and western medicine by the CWI-DTI model. Sci Rep 2024; 14:25054. [PMID: 39443630 PMCID: PMC11499656 DOI: 10.1038/s41598-024-76367-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 10/14/2024] [Indexed: 10/25/2024] Open
Abstract
Accurate prediction of drug-target interactions (DTIs) is crucial for advancing drug discovery and repurposing. Computational methods have significantly improved the efficiency of experimental predictions for drug-target interactions in Western medicine. However, accurately predicting the complex relationships between Chinese medicine ingredients and targets remains a formidable challenge due to the vast number and high heterogeneity of these ingredients. In this study, we introduce the CWI-DTI method, which achieves high-accuracy prediction of DTIs using a large dataset of interactive relationships of drug ingredients or candidate targets. Moreover, we present a novel dataset to evaluate the prediction accuracy of both Chinese and Western medicine. Through meticulous collection and preprocessing of data on ingredients and targets, we employ an innovative autoencoder framework to fuse multiple drug (target) topological similarity matrices. Additionally, we employ denoising blocks, sparse blocks, and stacked blocks to extract crucial features from the similarity matrix, reducing noise and enhancing accuracy across diverse datasets. Our results indicate that the CWI-DTI model shows improved performance compared to several existing state-of-the-art methods on the datasets tested in both Western and Chinese medicine databases. The findings of this study hold immense promise for advancing DTI prediction in Chinese and Western medicine, thus fostering more efficient drug discovery and repurposing endeavors. Our model is available at https://github.com/WANG-BIN-LAB/CWIDTI .
Collapse
Affiliation(s)
- Ying Li
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Xingyu Zhang
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Zhuo Chen
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Hongye Yang
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Yuhui Liu
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Huiqing Wang
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Ting Yan
- Department of Pathology, Shanxi Key Laboratory of Carcinogenesis and Translational Research on Esophageal Cancer, Shanxi Medical University, Taiyuan, China
| | - Jie Xiang
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Bin Wang
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China.
| |
Collapse
|
2
|
Ni S, Kong X, Zhang Y, Chen Z, Wang Z, Fu Z, Huo R, Tong X, Qu N, Wu X, Wang K, Zhang W, Zhang R, Zhang Z, Shi J, Wang Y, Yang R, Li X, Zhang S, Zheng M. Identifying compound-protein interactions with knowledge graph embedding of perturbation transcriptomics. CELL GENOMICS 2024; 4:100655. [PMID: 39303708 DOI: 10.1016/j.xgen.2024.100655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 07/04/2024] [Accepted: 08/20/2024] [Indexed: 09/22/2024]
Abstract
The emergence of perturbation transcriptomics provides a new perspective for drug discovery, but existing analysis methods suffer from inadequate performance and limited applicability. In this work, we present PertKGE, a method designed to deconvolute compound-protein interactions from perturbation transcriptomics with knowledge graph embedding. By considering multi-level regulatory events within biological systems that share the same semantic context, PertKGE significantly improves deconvoluting accuracy in two critical "cold-start" settings: inferring targets for new compounds and conducting virtual screening for new targets. We further demonstrate the pivotal role of incorporating multi-level regulatory events in alleviating representational biases. Notably, it enables the identification of ectonucleotide pyrophosphatase/phosphodiesterase-1 as the target responsible for the unique anti-tumor immunotherapy effect of tankyrase inhibitor K-756 and the discovery of five novel hits targeting the emerging cancer therapeutic target aldehyde dehydrogenase 1B1 with a remarkable hit rate of 10.2%. These findings highlight the potential of PertKGE to accelerate drug discovery.
Collapse
Affiliation(s)
- Shengkun Ni
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Yingying Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; The First Affiliated Hospital of USTC (Anhui Provincial Hospital), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230001, China
| | - Zhengyang Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Zhaokun Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Ruifeng Huo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xiaolong Wu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Kun Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; The First Affiliated Hospital of USTC (Anhui Provincial Hospital), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230001, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Zimei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; The First Affiliated Hospital of USTC (Anhui Provincial Hospital), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230001, China
| | - Jiangshan Shi
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Ruirui Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China.
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China; Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China; School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China.
| |
Collapse
|
3
|
Matsuo K, Ogawa H, Yamaoka S, Waku T, Kobori A. A chemical platform for the efficient screening of arylazopyrazole-based photoswitchable CENP-E inhibitors using mild cyclization reactions. Bioorg Med Chem Lett 2024; 111:129892. [PMID: 39029538 DOI: 10.1016/j.bmcl.2024.129892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 07/11/2024] [Accepted: 07/17/2024] [Indexed: 07/21/2024]
Abstract
A set of arylazopyrazole-based inhibitors targeting the mitotic motor protein CENP-E was discovered through the chemical platform using the quantitative cyclization of 1,3-diketone intermediate with various hydrazines under mild conditions. Through this efficient platform, the structure-activity relationship pertaining to the pyrazole photoswitch in photoswitchable CENP-E inhibitors not only in vitro but also in cells was successfully clarified.
Collapse
Affiliation(s)
- Kazuya Matsuo
- Faculty of Molecular Chemistry and Engineering, Kyoto Institute of Technology, Matsugasaki, Sakyo-ku, Kyoto 606-8585, Japan.
| | - Honoka Ogawa
- Faculty of Molecular Chemistry and Engineering, Kyoto Institute of Technology, Matsugasaki, Sakyo-ku, Kyoto 606-8585, Japan
| | - Shusuke Yamaoka
- Faculty of Molecular Chemistry and Engineering, Kyoto Institute of Technology, Matsugasaki, Sakyo-ku, Kyoto 606-8585, Japan
| | - Tomonori Waku
- Faculty of Molecular Chemistry and Engineering, Kyoto Institute of Technology, Matsugasaki, Sakyo-ku, Kyoto 606-8585, Japan
| | - Akio Kobori
- Faculty of Molecular Chemistry and Engineering, Kyoto Institute of Technology, Matsugasaki, Sakyo-ku, Kyoto 606-8585, Japan
| |
Collapse
|
4
|
Zhang Y, Mastouri M, Zhang Y. Accelerating drug discovery, development, and clinical trials by artificial intelligence. MED 2024; 5:1050-1070. [PMID: 39173629 DOI: 10.1016/j.medj.2024.07.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/21/2024] [Accepted: 07/25/2024] [Indexed: 08/24/2024]
Abstract
Artificial intelligence (AI) has profoundly advanced the field of biomedical research, which also demonstrates transformative capacity for innovation in drug development. This paper aims to deliver a comprehensive analysis of the progress in AI-assisted drug development, particularly focusing on small molecules, RNA, and antibodies. Moreover, this paper elucidates the current integration of AI methodologies within the industrial drug development framework. This encompasses a detailed examination of the industry-standard drug development process, supplemented by a review of medications presently undergoing clinical trials. Conclusively, the paper tackles a predominant obstacle within the AI pharmaceutical sector: the absence of AI-conceived drugs receiving approval. This paper also advocates for the adoption of large language models and diffusion models as a viable strategy to surmount this challenge. This review not only underscores the significant potential of AI in drug discovery but also deliberates on the challenges and prospects within this dynamically progressing field.
Collapse
Affiliation(s)
- Yilun Zhang
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China; School of Medicine, The Chinese University of Hong Kong (Shenzhen), Shenzhen, Guangdong, China
| | - Mohamed Mastouri
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China
| | - Yang Zhang
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China.
| |
Collapse
|
5
|
Wang X, Yin X, Jiang D, Zhao H, Wu Z, Zhang O, Wang J, Li Y, Deng Y, Liu H, Luo P, Han Y, Hou T, Yao X, Hsieh CY. Multi-modal deep learning enables efficient and accurate annotation of enzymatic active sites. Nat Commun 2024; 15:7348. [PMID: 39187482 PMCID: PMC11347633 DOI: 10.1038/s41467-024-51511-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 08/09/2024] [Indexed: 08/28/2024] Open
Abstract
Annotating active sites in enzymes is crucial for advancing multiple fields including drug discovery, disease research, enzyme engineering, and synthetic biology. Despite the development of numerous automated annotation algorithms, a significant trade-off between speed and accuracy limits their large-scale practical applications. We introduce EasIFA, an enzyme active site annotation algorithm that fuses latent enzyme representations from the Protein Language Model and 3D structural encoder, and then aligns protein-level information with the knowledge of enzymatic reactions using a multi-modal cross-attention framework. EasIFA outperforms BLASTp with a 10-fold speed increase and improved recall, precision, f1 score, and MCC by 7.57%, 13.08%, 9.68%, and 0.1012, respectively. It also surpasses empirical-rule-based algorithm and other state-of-the-art deep learning annotation method based on PSSM features, achieving a speed increase ranging from 650 to 1400 times while enhancing annotation quality. This makes EasIFA a suitable replacement for conventional tools in both industrial and academic settings. EasIFA can also effectively transfer knowledge gained from coarsely annotated enzyme databases to smaller, high-precision datasets, highlighting its ability to model sparse and high-quality databases. Additionally, EasIFA shows potential as a catalytic site monitoring tool for designing enzymes with desired functions beyond their natural distribution.
Collapse
Affiliation(s)
- Xiaorui Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
| | - Xiaodan Yin
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Huifeng Zhao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, 730000, Gansu, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
| | - Pei Luo
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
| | - Yuqiang Han
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong, 999077, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
6
|
Zhou Y, Lin H, Xie L, Huang Y, Wu L, Li SZ, Chen W. Effectiveness and Efficiency: Label-Aware Hierarchical Subgraph Learning for Protein-Protein Interaction. J Mol Biol 2024:168737. [PMID: 39102976 DOI: 10.1016/j.jmb.2024.168737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 07/26/2024] [Accepted: 07/31/2024] [Indexed: 08/07/2024]
Abstract
The study of protein-protein interactions (PPIs) holds immense significance in understanding various biological activities, as well as in drug discovery and disease diagnosis. Existing deep learning methods for PPI prediction, including graph neural networks (GNNs), have been widely employed as the solutions, while they often experience a decline in performance in the real world. We claim that the topological shortcut is one of the key problems contributing negatively to the performance, according to our analysis. By modeling the PPIs as a graph with protein as nodes and interactions as edge types, the prevailing models tend to learn the pattern of nodes' degrees rather than intrinsic sequence-structure profiles, leading to the problem termed topological shortcut. The huge data growth of PPI leads to intensive computational costs and challenges computing devices, causing infeasibility in practice. To address the discussed problems, we propose a label-aware hierarchical subgraph learning method (laruGL-PPI) that can effectively infer PPIs while being interpretable. Specifically, we introduced edge-based subgraph sampling to effectively alleviate the problems of topological shortcuts and high computing costs. Besides, the inner-outer connections of PPIs are modeled as a hierarchical graph, together with the dependencies between interaction types constructed by a label graph. Extensive experiments conducted across various scales of PPI datasets have conclusively demonstrated that the laruGL-PPI method surpasses the most advanced PPI prediction techniques currently available, particularly in the testing of unseen proteins. Also, our model can recognize crucial sites of proteins, such as surface sites for binding and active sites for catalysis.
Collapse
Affiliation(s)
- Yuanqing Zhou
- Department of Food Science and Nutrition, College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China; AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China
| | - Haitao Lin
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China
| | - Lianghua Xie
- Department of Food Science and Nutrition, College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China
| | - Yufei Huang
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China
| | - Lirong Wu
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China
| | - Stan Z Li
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China.
| | - Wei Chen
- Department of Food Science and Nutrition, College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
7
|
Menichetti G, Barabási AL, Loscalzo J. Decoding the Foodome: Molecular Networks Connecting Diet and Health. Annu Rev Nutr 2024; 44:257-288. [PMID: 39207880 DOI: 10.1146/annurev-nutr-062322-030557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Diet, a modifiable risk factor, plays a pivotal role in most diseases, from cardiovascular disease to type 2 diabetes mellitus, cancer, and obesity. However, our understanding of the mechanistic role of the chemical compounds found in food remains incomplete. In this review, we explore the "dark matter" of nutrition, going beyond the macro- and micronutrients documented by national databases to unveil the exceptional chemical diversity of food composition. We also discuss the need to explore the impact of each compound in the presence of associated chemicals and relevant food sources and describe the tools that will allow us to do so. Finally, we discuss the role of network medicine in understanding the mechanism of action of each food molecule. Overall, we illustrate the important role of network science and artificial intelligence in our ability to reveal nutrition's multifaceted role in health and disease.
Collapse
Affiliation(s)
- Giulia Menichetti
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA;
- Network Science Institute and Department of Physics, Northeastern University, Boston, Massachusetts, USA
- Harvard Data Science Initiative, Harvard University, Boston, Massachusetts, USA
| | - Albert-László Barabási
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA;
- Network Science Institute and Department of Physics, Northeastern University, Boston, Massachusetts, USA
- Department of Network and Data Science, Central European University, Budapest, Hungary
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA;
| |
Collapse
|
8
|
Bernett J, Blumenthal DB, Grimm DG, Haselbeck F, Joeres R, Kalinina OV, List M. Guiding questions to avoid data leakage in biological machine learning applications. Nat Methods 2024; 21:1444-1453. [PMID: 39122953 DOI: 10.1038/s41592-024-02362-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 06/26/2024] [Indexed: 08/12/2024]
Abstract
Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology.
Collapse
Affiliation(s)
- Judith Bernett
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
| | - Dominik G Grimm
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany.
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany.
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
| | - Florian Haselbeck
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany
- Smart Farming, Weihenstephan-Triesdorf University of Applied Sciences, Freising, Germany
| | - Roman Joeres
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
- Medical Faculty, Saarland University, Homburg, Germany.
| | - Markus List
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
- Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany.
| |
Collapse
|
9
|
Fan K, Gökbağ B, Tang S, Li S, Huang Y, Wang L, Cheng L, Li L. Synthetic lethal connectivity and graph transformer improve synthetic lethality prediction. Brief Bioinform 2024; 25:bbae425. [PMID: 39210507 PMCID: PMC11361842 DOI: 10.1093/bib/bbae425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 06/14/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024] Open
Abstract
Synthetic lethality (SL) has shown great promise for the discovery of novel targets in cancer. CRISPR double-knockout (CDKO) technologies can only screen several hundred genes and their combinations, but not genome-wide. Therefore, good SL prediction models are highly needed for genes and gene pairs selection in CDKO experiments. However, lack of scalable SL properties prevents generalizability of SL interactions to out-of-sample data, thereby hindering modeling efforts. In this paper, we recognize that SL connectivity is a scalable and generalizable SL property. We develop a novel two-step multilayer encoder for individual sample-specific SL prediction model (MLEC-iSL), which predicts SL connectivity first and SL interactions subsequently. MLEC-iSL has three encoders, namely, gene, graph, and transformer encoders. MLEC-iSL achieves high SL prediction performance in K562 (AUPR, 0.73; AUC, 0.72) and Jurkat (AUPR, 0.73; AUC, 0.71) cells, while no existing methods exceed 0.62 AUPR and AUC. The prediction performance of MLEC-iSL is validated in a CDKO experiment in 22Rv1 cells, yielding a 46.8% SL rate among 987 selected gene pairs. The screen also reveals SL dependency between apoptosis and mitosis cell death pathways.
Collapse
Affiliation(s)
- Kunjie Fan
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
| | - Birkan Gökbağ
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
| | - Shan Tang
- Department of Biomedical Informatics, College of Pharmacy, The Ohio State University, 500 W. 12 ave, Columbus, OH 43210, United States
| | - Shangjia Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
| | - Yirui Huang
- Department of Biomedical Informatics, College of Pharmacy, The Ohio State University, 500 W. 12 ave, Columbus, OH 43210, United States
| | - Lingling Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
| | - Lijun Cheng
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
- Department of Biomedical Informatics, College of Pharmacy, The Ohio State University, 500 W. 12 ave, Columbus, OH 43210, United States
| |
Collapse
|
10
|
Liu S, Yu J, Ni N, Wang Z, Chen M, Li Y, Xu C, Ding Y, Zhang J, Yao X, Liu H. Versatile Framework for Drug-Target Interaction Prediction by Considering Domain-Specific Features. J Chem Inf Model 2024; 64:5646-5656. [PMID: 38976879 DOI: 10.1021/acs.jcim.4c00403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting drug-target interactions (DTIs) is one of the crucial tasks in drug discovery, but traditional wet-lab experiments are costly and time-consuming. Recently, deep learning has emerged as a promising tool for accelerating DTI prediction due to its powerful performance. However, the models trained on limited known DTI data struggle to generalize effectively to novel drug-target pairs. In this work, we propose a strategy to train an ensemble of models by capturing both domain-generic and domain-specific features (E-DIS) to learn diverse domain features and adapt them to out-of-distribution data. Multiple experts were trained on different domains to capture and align domain-specific information from various distributions without accessing any data from unseen domains. E-DIS provides a comprehensive representation of proteins and ligands by capturing diverse features. Experimental results on four benchmark data sets in both in-domain and cross-domain settings demonstrated that E-DIS significantly improved model performance and domain generalization compared to existing methods. Our approach presents a significant advancement in DTI prediction by combining domain-generic and domain-specific features, enhancing the generalization ability of the DTI prediction model.
Collapse
Affiliation(s)
- Shuo Liu
- School of Pharmacy, Lanzhou University, Gansu 730000, China
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jialiang Yu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Ningxi Ni
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Zidong Wang
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Mengyun Chen
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Gansu 730000, China
| | - Chen Xu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yahao Ding
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jun Zhang
- Changping Laboratory, Beijing 102200, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| |
Collapse
|
11
|
Piras A, Chenghao S, Sebek M, Ispirova G, Menichetti G. CPIExtract: A software package to collect and harmonize small molecule and protein interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.03.601957. [PMID: 39005430 PMCID: PMC11245042 DOI: 10.1101/2024.07.03.601957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
The binding interactions between small molecules and proteins are the basis of cellular functions. Yet, experimental data available regarding compound-protein interaction is not harmonized into a single entity but rather scattered across multiple institutions, each maintaining databases with different formats. Extracting information from these multiple sources remains challenging due to data heterogeneity. Here, we present CPIExtract (Compound-Protein Interaction Extract), a tool to interactively extract experimental binding interaction data from multiple databases, perform filtering, and harmonize the resulting information, thus providing a gain of compound-protein interaction data. When compared to a single source, DrugBank, we show that it can collect more than 10 times the amount of annotations. The end-user can apply custom filtering to the aggregated output data and save it in any generic tabular file suitable for further downstream tasks such as network medicine analyses for drug repurposing and cross-validation of deep learning models.
Collapse
Affiliation(s)
- Andrea Piras
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133, Milan, Italy
| | - Shi Chenghao
- Network Science Institute, Northeastern University, 360 Huntington Ave, 02115, MA, USA
| | - Michael Sebek
- Network Science Institute, Northeastern University, 360 Huntington Ave, 02115, MA, USA
| | - Gordana Ispirova
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 181 Longwood Ave, 02115, MA, USA
| | - Giulia Menichetti
- Network Science Institute, Northeastern University, 360 Huntington Ave, 02115, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 181 Longwood Ave, 02115, MA, USA
- Harvard Data Science Initiative, Harvard University, 114 Western Avenue, 02134, MA, USA
| |
Collapse
|
12
|
Turgutalp B, Kizil C. Multi-target drugs for Alzheimer's disease. Trends Pharmacol Sci 2024; 45:628-638. [PMID: 38853102 DOI: 10.1016/j.tips.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 04/28/2024] [Accepted: 05/09/2024] [Indexed: 06/11/2024]
Abstract
Alzheimer's disease (AD), a leading cause of dementia, increasingly challenges our healthcare systems and society. Traditional therapies aimed at single targets have fallen short owing to the complex, multifactorial nature of AD that necessitates simultaneous targeting of various disease mechanisms for clinical success. Therefore, targeting multiple pathologies at the same time could provide a synergistic therapeutic effect. The identification of new disease targets beyond the classical hallmarks of AD offers a fertile ground for the design of new multi-target drugs (MTDs), and building on existing compounds have the potential to yield in successful disease modifying therapies. This review discusses the evolving landscape of MTDs, focusing on their potential as AD therapeutics. Analysis of past and current trials of compounds with multi-target activity underscores the capacity of MTDs to offer synergistic therapeutic effects, and the flourishing genetic understanding of AD will inform and inspire the development of MTD-based AD therapies.
Collapse
Affiliation(s)
- Bengisu Turgutalp
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, Columbia University, 650 West 168th Street, New York, NY 10032, USA; Department of Neurology, Columbia University Irving Medical Center, Columbia University, 710 West 168th Street, New York, NY 10032, USA.
| | - Caghan Kizil
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, Columbia University, 650 West 168th Street, New York, NY 10032, USA; Department of Neurology, Columbia University Irving Medical Center, Columbia University, 710 West 168th Street, New York, NY 10032, USA; Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University Irving Medical Center, Columbia University, 630 West 168th Street, New York, NY, USA.
| |
Collapse
|
13
|
Ravandi B, Mehler P, Ispirova G, Barabási AL, Menichetti G. GroceryDB: Prevalence of Processed Food in Grocery Stores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2022.04.23.22274217. [PMID: 38883708 PMCID: PMC11177926 DOI: 10.1101/2022.04.23.22274217] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]
Abstract
The offering of grocery stores is a strong driver of consumer decisions, shaping their diet and long-term health. While highly processed food like packaged products, processed meat, and sweetened soft drinks have been increasingly associated with unhealthy diet, information on the degree of processing characterizing an item in a store is not straightforward to obtain, limiting the ability of individuals to make informed choices. Here we introduce GroceryDB, a database with over 50,000 food items sold by Walmart, Target, and Wholefoods, unveiling how big data can be harnessed to empower consumers and policymakers with systematic access to the degree of processing of the foods they select, and the potential alternatives in the surrounding food environment. The wealth of data collected on ingredient lists and nutrition facts allows a large scale analysis of ingredient patterns and degree of processing stratified by store, food category, and price range. We find that the nutritional choices of the consumers, translated as the degree of food processing, strongly depend on the food categories and grocery stores. Moreover, the data allows us to quantify the individual contribution of over 1,000 ingredients to ultra-processing. GroceryDB and the associated http://TrueFood.Tech/ website make this information accessible, guiding consumers toward less processed food choices while assisting policymakers in reforming the food supply.
Collapse
Affiliation(s)
- Babak Ravandi
- Network Science Institute and Department of Physics, Northeastern University, Boston, USA
| | - Peter Mehler
- Department of Computer Science, IT University of Copenhagen, Copenhagen, Denmark
| | - Gordana Ispirova
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| | - Albert-László Barabási
- Network Science Institute and Department of Physics, Northeastern University, Boston, USA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, USA
- Department of Network and Data Science, Central European University, Budapest, Hungary
| | - Giulia Menichetti
- Network Science Institute and Department of Physics, Northeastern University, Boston, USA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, USA
- Harvard Data Science Initiative, Harvard University, Boston, USA
| |
Collapse
|
14
|
Zhang Y, Li J, Lin S, Zhao J, Xiong Y, Wei DQ. An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model. J Cheminform 2024; 16:67. [PMID: 38849874 PMCID: PMC11162000 DOI: 10.1186/s13321-024-00862-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 05/19/2024] [Indexed: 06/09/2024] Open
Abstract
Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes.Scientific contributionsThe methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process.
Collapse
Affiliation(s)
- Yufang Zhang
- School of Mathematical Sciences and SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, 200240, China
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, 473006, Henan, China
| | - Jiayi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Jianwei Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
| | - Dong-Qing Wei
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China.
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, 473006, Henan, China.
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China.
| |
Collapse
|
15
|
Stincone P, Naimi A, Saviola AJ, Reher R, Petras D. Decoding the molecular interplay in the central dogma: An overview of mass spectrometry-based methods to investigate protein-metabolite interactions. Proteomics 2024; 24:e2200533. [PMID: 37929699 DOI: 10.1002/pmic.202200533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 10/15/2023] [Accepted: 10/23/2023] [Indexed: 11/07/2023]
Abstract
With the emergence of next-generation nucleotide sequencing and mass spectrometry-based proteomics and metabolomics tools, we have comprehensive and scalable methods to analyze the genes, transcripts, proteins, and metabolites of a multitude of biological systems. Despite the fascinating new molecular insights at the genome, transcriptome, proteome and metabolome scale, we are still far from fully understanding cellular organization, cell cycles and biology at the molecular level. Significant advances in sensitivity and depth for both sequencing as well as mass spectrometry-based methods allow the analysis at the single cell and single molecule level. At the same time, new tools are emerging that enable the investigation of molecular interactions throughout the central dogma of molecular biology. In this review, we provide an overview of established and recently developed mass spectrometry-based tools to probe metabolite-protein interactions-from individual interaction pairs to interactions at the proteome-metabolome scale.
Collapse
Affiliation(s)
- Paolo Stincone
- University of Tuebingen, CMFI Cluster of Excellence, Interfaculty Institute of Microbiology and Infection Medicine, Tuebingen, Germany
- University of Tuebingen, Center for Plant Molecular Biology, Tuebingen, Germany
| | - Amira Naimi
- University of Marburg, Institute of Pharmaceutical Biology and Biotechnology, Marburg, Germany
| | | | - Raphael Reher
- University of Marburg, Institute of Pharmaceutical Biology and Biotechnology, Marburg, Germany
| | - Daniel Petras
- University of Tuebingen, CMFI Cluster of Excellence, Interfaculty Institute of Microbiology and Infection Medicine, Tuebingen, Germany
- University of California Riverside, Department of Biochemistry, Riverside, USA
| |
Collapse
|
16
|
Jangwan NS, Khan M, Das R, Altwaijry N, Sultan AM, Khan R, Saleem S, Singh MF. From petals to healing: consolidated network pharmacology and molecular docking investigations of the mechanisms underpinning Rhododendron arboreum flower's anti-NAFLD effects. Front Pharmacol 2024; 15:1366279. [PMID: 38863975 PMCID: PMC11165132 DOI: 10.3389/fphar.2024.1366279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 04/25/2024] [Indexed: 06/13/2024] Open
Abstract
Rhododendron arboreum: Sm., also known as Burans is traditionally used as an anti-inflammatory, anti-diabetic, hepatoprotective, adaptogenic, and anti-oxidative agent. It has been used since ancient times in Indian traditional medicine for various liver disorders. However, the exact mechanism behind its activity against NAFLD is not known. The aim of the present study is to investigate the molecular mechanism of Rhododendron arboreum flower (RAF) in the treatment of NAFLD using network pharmacology and molecular docking methods. Bioactives were also predicted for their drug-likeness score, probable side effects and ADMET profile. Protein-protein interaction (PPI) data was obtained using the STRING platform. For the visualisation of GO analysis, a bioinformatics server was employed. Through molecular docking, the binding affinity between potential targets and active compounds were assessed. A total of five active compounds of RAF and 30 target proteins were selected. The targets with higher degrees were identified through the PPI network. GO analysis indicated that the NAFLD treatment with RAF primarily entails a response to the fatty acid biosynthetic process, lipid metabolic process, regulation of cell death, regulation of stress response, and cellular response to a chemical stimulus. Molecular docking and molecular dynamic simulation exhibited that rutin has best binding affinity among active compounds and selected targets as indicated by the binding energy, RMSD, and RMSF data. The findings comprehensively elucidated toxicity data, potential targets of bioactives and molecular mechanisms of RAF against NAFLD, providing a promising novel strategy for future research on NAFLD treatment.
Collapse
Affiliation(s)
- Nitish Singh Jangwan
- Department of Pharmacognosy and Phytochemistry, School of Pharmaceutical Sciences, Delhi Pharmaceutical Sciences and Research University, New Delhi, India
| | - Mausin Khan
- Department of Pharmaceutical Chemistry, School of Pharmaceutical Sciences and Technology, Sardar Bhagwan Singh University, Dehradun, Uttarakhand, India
| | - Richa Das
- Department of Biotechnology, Parul Institute of Applied Science, Parul University, Vadodara, Gujarat, India
| | - Najla Altwaijry
- Department of Pharmaceutical Sciences, College of Pharmacy, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Ahlam Mansour Sultan
- Department of Pharmaceutical Sciences, College of Pharmacy, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Ruqaiyah Khan
- Department of Basic Health Sciences, Deanship of Preparatory Year for the Health Colleges, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Shakir Saleem
- Department of Public Health, College of Health Sciences, Saudi Electronic University, Riyadh, Saudi Arabia
| | - Mamta F. Singh
- College of Pharmacy, COER University, Roorkee, Uttarakhand, India
| |
Collapse
|
17
|
Hao B, Kovács IA. Proper network randomization is key to assessing social balance. SCIENCE ADVANCES 2024; 10:eadj0104. [PMID: 38701217 PMCID: PMC11068007 DOI: 10.1126/sciadv.adj0104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 04/01/2024] [Indexed: 05/05/2024]
Abstract
Social ties, either positive or negative, lead to signed network patterns, the subject of balance theory. For example, strong balance introduces cycles with even numbers of negative edges. The statistical significance of such patterns is routinely assessed by comparisons to null models. Yet, results in signed networks remain controversial. Here, we show that even if a network exhibits strong balance by construction, current null models can fail to identify it. Our results indicate that matching the signed degree preferences of the nodes is a critical step and so is the preservation of network topology in the null model. As a solution, we propose the STP null model, which integrates both constraints within a maximum entropy framework. STP randomization leads to qualitatively different results, with most social networks consistently demonstrating strong balance in three- and four-node patterns. On the basis our results, we present a potential wiring mechanism behind the observed signed patterns and outline further applications of STP randomization.
Collapse
Affiliation(s)
- Bingjie Hao
- Department of Physics and Astronomy, Northwestern University, Evanston, IL 60208, USA
| | - István A. Kovács
- Department of Physics and Astronomy, Northwestern University, Evanston, IL 60208, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL 60208, USA
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 60208, USA
| |
Collapse
|
18
|
Xia Y, Pan X, Shen HB. Heterogeneous sampled subgraph neural networks with knowledge distillation to enhance double-blind compound-protein interaction prediction. Structure 2024; 32:611-620.e4. [PMID: 38447575 DOI: 10.1016/j.str.2024.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/18/2023] [Accepted: 02/08/2024] [Indexed: 03/08/2024]
Abstract
Identifying binding compounds against a target protein is crucial for large-scale virtual screening in drug development. Recently, network-based methods have been developed for compound-protein interaction (CPI) prediction. However, they are difficult to be applied to unseen (i.e., never-seen-before) proteins and compounds. In this study, we propose SgCPI to incorporate local known interacting networks to predict CPI interactions. SgCPI randomly samples the local CPI network of the query compound-protein pair as a subgraph and applies a heterogeneous graph neural network (HGNN) to embed the active/inactive message of the subgraph. For unseen compounds and proteins, SgCPI-KD takes SgCPI as the teacher model to distillate its knowledge by estimating the potential neighbors. Experimental results indicate: (1) the sampled subgraphs of the CPI network introduce efficient knowledge for unseen molecular prediction with the HGNNs, and (2) the knowledge distillation strategy is beneficial to the double-blind interaction prediction by estimating molecular neighbors and distilling knowledge.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
19
|
Liu JX, Zhang X, Huang YQ, Hao GF, Yang GF. Multi-level bioinformatics resources support drug target discovery of protein-protein interactions. Drug Discov Today 2024; 29:103979. [PMID: 38608830 DOI: 10.1016/j.drudis.2024.103979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/14/2024] [Accepted: 04/05/2024] [Indexed: 04/14/2024]
Abstract
Drug discovery often begins with a new target. Protein-protein interactions (PPIs) are crucial to multitudinous cellular processes and offer a promising avenue for drug-target discovery. PPIs are characterized by multi-level complexity: at the protein level, interaction networks can be used to identify potential targets, whereas at the residue level, the details of the interactions of individual PPIs can be used to examine a target's druggability. Much great progress has been made in target discovery through multi-level PPI-related computational approaches, but these resources have not been fully discussed. Here, we systematically survey bioinformatics tools for identifying and assessing potential drug targets, examining their characteristics, limitations and applications. This work will aid the integration of the broader protein-to-network context with the analysis of detailed binding mechanisms to support the discovery of drug targets.
Collapse
Affiliation(s)
- Jia-Xin Liu
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Xiao Zhang
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Yuan-Qin Huang
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China; State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China.
| | - Guang-Fu Yang
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China.
| |
Collapse
|
20
|
Wang M, Wang J, Rong Z, Wang L, Xu Z, Zhang L, He J, Li S, Cao L, Hou Y, Li K. A bidirectional interpretable compound-protein interaction prediction framework based on cross attention. Comput Biol Med 2024; 172:108239. [PMID: 38460309 DOI: 10.1016/j.compbiomed.2024.108239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 02/25/2024] [Accepted: 02/26/2024] [Indexed: 03/11/2024]
Abstract
The identification of compound-protein interactions (CPIs) plays a vital role in drug discovery. However, the huge cost and labor-intensive nature in vitro and vivo experiments make it urgent for researchers to develop novel CPI prediction methods. Despite emerging deep learning methods have achieved promising performance in CPI prediction, they also face ongoing challenges: (i) providing bidirectional interpretability from both the chemical and biological perspective for the prediction results; (ii) comprehensively evaluating model generalization performance; (iii) demonstrating the practical applicability of these models. To overcome the challenges posed by current deep learning methods, we propose a cross multi-head attention oriented bidirectional interpretable CPI prediction model (CmhAttCPI). First, CmhAttCPI takes molecular graphs and protein sequences as inputs, utilizing the GCW module to learn atom features and the CNN module to learn residue features, respectively. Second, the model applies cross multi-head attention module to compute attention weights for atoms and residues. Finally, CmhAttCPI employs a fully connected neural network to predict scores for CPIs. We evaluated the performance of CmhAttCPI on balanced datasets and imbalanced datasets. The results consistently show that CmhAttCPI outperforms multiple state-of-the-art methods. We constructed three scenarios based on compound and protein clustering and comprehensively evaluated the model generalization ability within these scenarios. The results demonstrate that the generalization ability of CmhAttCPI surpasses that of other models. Besides, the visualizations of attention weights reveal that CmhAttCPI provides chemical and biological interpretation for CPI prediction. Moreover, case studies confirm the practical applicability of CmhAttCPI in discovering anticancer candidates.
Collapse
Affiliation(s)
- Meng Wang
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Jianmin Wang
- School of Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon, 21983, Republic of Korea
| | - Zhiwei Rong
- School of Public Health, Peking University, Beijing, 100871, China
| | - Liuying Wang
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Zhenyi Xu
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Liuchao Zhang
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Jia He
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Shuang Li
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Lei Cao
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Yan Hou
- School of Public Health, Peking University, Beijing, 100871, China
| | - Kang Li
- School of Public Health, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
21
|
Wang Z, Wang S, Li Y, Guo J, Wei Y, Mu Y, Zheng L, Li W. A new paradigm for applying deep learning to protein-ligand interaction prediction. Brief Bioinform 2024; 25:bbae145. [PMID: 38581420 PMCID: PMC10998640 DOI: 10.1093/bib/bbae145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 03/18/2024] [Indexed: 04/08/2024] Open
Abstract
Protein-ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein-ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein-ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein-ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Sheng Wang
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
| | - Yangyang Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, Macao, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Weifeng Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| |
Collapse
|
22
|
Pogány D, Antal P. Towards explainable interaction prediction: Embedding biological hierarchies into hyperbolic interaction space. PLoS One 2024; 19:e0300906. [PMID: 38512848 PMCID: PMC10956837 DOI: 10.1371/journal.pone.0300906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/06/2024] [Indexed: 03/23/2024] Open
Abstract
Given the prolonged timelines and high costs associated with traditional approaches, accelerating drug development is crucial. Computational methods, particularly drug-target interaction prediction, have emerged as efficient tools, yet the explainability of machine learning models remains a challenge. Our work aims to provide more interpretable interaction prediction models using similarity-based prediction in a latent space aligned to biological hierarchies. We investigated integrating drug and protein hierarchies into a joint-embedding drug-target latent space via embedding regularization by conducting a comparative analysis between models employing traditional flat Euclidean vector spaces and those utilizing hyperbolic embeddings. Besides, we provided a latent space analysis as an example to show how we can gain visual insights into the trained model with the help of dimensionality reduction. Our results demonstrate that hierarchy regularization improves interpretability without compromising predictive performance. Furthermore, integrating hyperbolic embeddings, coupled with regularization, enhances the quality of the embedded hierarchy trees. Our approach enables a more informed and insightful application of interaction prediction models in drug discovery by constructing an interpretable hyperbolic latent space, simultaneously incorporating drug and target hierarchies and pairing them with available interaction information. Moreover, compatible with pairwise methods, the approach allows for additional transparency through existing explainable AI solutions.
Collapse
Affiliation(s)
- Domonkos Pogány
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
| | - Péter Antal
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
| |
Collapse
|
23
|
Ektefaie Y, Shen A, Bykova D, Marin M, Zitnik M, Farhat M. Evaluating generalizability of artificial intelligence models for molecular datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.25.581982. [PMID: 38464295 PMCID: PMC10925170 DOI: 10.1101/2024.02.25.581982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Deep learning has made rapid advances in modeling molecular sequencing data. Despite achieving high performance on benchmarks, it remains unclear to what extent deep learning models learn general principles and generalize to previously unseen sequences. Benchmarks traditionally interrogate model generalizability by generating metadata based (MB) or sequence-similarity based (SB) train and test splits of input data before assessing model performance. Here, we show that this approach mischaracterizes model generalizability by failing to consider the full spectrum of cross-split overlap, i.e., similarity between train and test splits. We introduce Spectra, a spectral framework for comprehensive model evaluation. For a given model and input data, Spectra plots model performance as a function of decreasing cross-split overlap and reports the area under this curve as a measure of generalizability. We apply Spectra to 18 sequencing datasets with associated phenotypes ranging from antibiotic resistance in tuberculosis to protein-ligand binding to evaluate the generalizability of 19 state-of-the-art deep learning models, including large language models, graph neural networks, diffusion models, and convolutional neural networks. We show that SB and MB splits provide an incomplete assessment of model generalizability. With Spectra, we find as cross-split overlap decreases, deep learning models consistently exhibit a reduction in performance in a task- and model-dependent manner. Although no model consistently achieved the highest performance across all tasks, we show that deep learning models can generalize to previously unseen sequences on specific tasks. Spectra paves the way toward a better understanding of how foundation models generalize in biology.
Collapse
Affiliation(s)
- Yasha Ektefaie
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Andrew Shen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Computer Science, Northwestern University, Evanston, IL, USA
| | - Daria Bykova
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Maximillian Marin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| | - Maha Farhat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary and Critical Care, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
24
|
Bernett J, Blumenthal DB, List M. Cracking the black box of deep sequence-based protein-protein interaction prediction. Brief Bioinform 2024; 25:bbae076. [PMID: 38446741 PMCID: PMC10939362 DOI: 10.1093/bib/bbae076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/09/2024] [Indexed: 03/08/2024] Open
Abstract
Identifying protein-protein interactions (PPIs) is crucial for deciphering biological pathways. Numerous prediction methods have been developed as cheap alternatives to biological experiments, reporting surprisingly high accuracy estimates. We systematically investigated how much reproducible deep learning models depend on data leakage, sequence similarities and node degree information, and compared them with basic machine learning models. We found that overlaps between training and test sets resulting from random splitting lead to strongly overestimated performances. In this setting, models learn solely from sequence similarities and node degrees. When data leakage is avoided by minimizing sequence similarities between training and test set, performances become random. Moreover, baseline models directly leveraging sequence similarity and network topology show good performances at a fraction of the computational cost. Thus, we advocate that any improvements should be reported relative to baseline methods in the future. Our findings suggest that predicting PPIs remains an unsolved task for proteins showing little sequence similarity to previously studied proteins, highlighting that further experimental research into the 'dark' protein interactome and better computational methods are needed.
Collapse
Affiliation(s)
- Judith Bernett
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof Forum 3, 85354, Freising, Germany
| | - David B Blumenthal
- Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Werner-von-Siemens-Str. 61, 91052, Erlangen, Germany
| | - Markus List
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof Forum 3, 85354, Freising, Germany
| |
Collapse
|
25
|
Li Y, Lyu J, Wang Y, Ye M, Wang H. Ligand Modification-Free Methods for the Profiling of Protein-Environmental Chemical Interactions. Chem Res Toxicol 2024; 37:1-15. [PMID: 38146056 DOI: 10.1021/acs.chemrestox.3c00282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
Adverse health outcomes caused by environmental chemicals are often initiated via their interactions with proteins. Essentially, one environmental chemical may interact with a number of proteins and/or a protein may interact with a multitude of environmental chemicals, forming an intricate interaction network. Omics-wide protein-environmental chemical interaction profiling (PECI) is of prominent importance for comprehensive understanding of these interaction networks, including the toxicity mechanisms of action (MoA), and for providing systematic chemical safety assessment. However, such information remains unknown for most environmental chemicals, partly due to their vast chemical diversity. In recent years, with the continuous efforts afforded, especially in mass spectrometry (MS) based omics technologies, several ligand modification-free methods have been developed, and new attention for systematic PECI profiling was gained. In this Review, we provide a comprehensive overview on these methodologies for the identification of ligand-protein interactions, including affinity interaction-based methods of affinity-driven purification, covalent modification profiling, and activity-based protein profiling (ABPP) in a competitive mode, physicochemical property changes assessment methods of ligand-directed nuclear magnetic resonance (ligand-directed NMR), MS integrated with equilibrium dialysis for the discovery of allostery systematically (MIDAS), thermal proteome profiling (TPP), limited proteolysis-coupled mass spectrometry (LiP-MS), stability of proteins from rates of oxidation (SPROX), and several intracellular downstream response characterization methods. We expect that the applications of these ligand modification-free technologies will drive a considerable increase in the number of PECI identified, facilitate unveiling the toxicological mechanisms, and ultimately contribute to systematic health risk assessment of environmental chemicals.
Collapse
Affiliation(s)
- Yanan Li
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- The State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Jiawen Lyu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China
| | - Yan Wang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China
| | - Mingliang Ye
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China
- State Key Laboratory of Medical Proteomics, Beijing, 102206, China
| | - Hailin Wang
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- The State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| |
Collapse
|
26
|
Xi C, Diao J, Moon TS. Advances in ligand-specific biosensing for structurally similar molecules. Cell Syst 2023; 14:1024-1043. [PMID: 38128482 PMCID: PMC10751988 DOI: 10.1016/j.cels.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/23/2023] [Accepted: 10/19/2023] [Indexed: 12/23/2023]
Abstract
The specificity of biological systems makes it possible to develop biosensors targeting specific metabolites, toxins, and pollutants in complex medical or environmental samples without interference from structurally similar compounds. For the last two decades, great efforts have been devoted to creating proteins or nucleic acids with novel properties through synthetic biology strategies. Beyond augmenting biocatalytic activity, expanding target substrate scopes, and enhancing enzymes' enantioselectivity and stability, an increasing research area is the enhancement of molecular specificity for genetically encoded biosensors. Here, we summarize recent advances in the development of highly specific biosensor systems and their essential applications. First, we describe the rational design principles required to create libraries containing potential mutants with less promiscuity or better specificity. Next, we review the emerging high-throughput screening techniques to engineer biosensing specificity for the desired target. Finally, we examine the computer-aided evaluation and prediction methods to facilitate the construction of ligand-specific biosensors.
Collapse
Affiliation(s)
- Chenggang Xi
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Jinjin Diao
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Tae Seok Moon
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA; Division of Biology and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
27
|
Wang Y, Xia Y, Yan J, Yuan Y, Shen HB, Pan X. ZeroBind: a protein-specific zero-shot predictor with subgraph matching for drug-target interactions. Nat Commun 2023; 14:7861. [PMID: 38030641 PMCID: PMC10687269 DOI: 10.1038/s41467-023-43597-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 11/13/2023] [Indexed: 12/01/2023] Open
Abstract
Existing drug-target interaction (DTI) prediction methods generally fail to generalize well to novel (unseen) proteins and drugs. In this study, we propose a protein-specific meta-learning framework ZeroBind with subgraph matching for predicting protein-drug interactions from their structures. During the meta-training process, ZeroBind formulates training a protein-specific model, which is also considered a learning task, and each task uses graph neural networks (GNNs) to learn the protein graph embedding and the molecular graph embedding. Inspired by the fact that molecules bind to a binding pocket in proteins instead of the whole protein, ZeroBind introduces a weakly supervised subgraph information bottleneck (SIB) module to recognize the maximally informative and compressive subgraphs in protein graphs as potential binding pockets. In addition, ZeroBind trains the models of individual proteins as multiple tasks, whose importance is automatically learned with a task adaptive self-attention module to make final predictions. The results show that ZeroBind achieves superior performance on DTI prediction over existing methods, especially for those unseen proteins and drugs, and performs well after fine-tuning for those proteins or drugs with a few known binding partners.
Collapse
Affiliation(s)
- Yuxuan Wang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Junchi Yan
- Department of Computer Science and Engineering, and MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Ye Yuan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|
28
|
Shukla VK, Heller GT, Hansen DF. Biomolecular NMR spectroscopy in the era of artificial intelligence. Structure 2023; 31:1360-1374. [PMID: 37848030 DOI: 10.1016/j.str.2023.09.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/15/2023] [Accepted: 09/21/2023] [Indexed: 10/19/2023]
Abstract
Biomolecular nuclear magnetic resonance (NMR) spectroscopy and artificial intelligence (AI) have a burgeoning synergy. Deep learning-based structural predictors have forever changed structural biology, yet these tools currently face limitations in accurately characterizing protein dynamics, allostery, and conformational heterogeneity. We begin by highlighting the unique abilities of biomolecular NMR spectroscopy to complement AI-based structural predictions toward addressing these knowledge gaps. We then highlight the direct integration of deep learning approaches into biomolecular NMR methods. AI-based tools can dramatically improve the acquisition and analysis of NMR spectra, enhancing the accuracy and reliability of NMR measurements, thus streamlining experimental processes. Additionally, deep learning enables the development of novel types of NMR experiments that were previously unattainable, expanding the scope and potential of biomolecular NMR spectroscopy. Ultimately, a combination of AI and NMR promises to further revolutionize structural biology on several levels, advance our understanding of complex biomolecular systems, and accelerate drug discovery efforts.
Collapse
Affiliation(s)
- Vaibhav Kumar Shukla
- Department of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, UK
| | - Gabriella T Heller
- Department of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, UK.
| | - D Flemming Hansen
- Department of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, UK.
| |
Collapse
|
29
|
Gu S, Liu H, Liu L, Hou T, Kang Y. Artificial intelligence methods in kinase target profiling: Advances and challenges. Drug Discov Today 2023; 28:103796. [PMID: 37805065 DOI: 10.1016/j.drudis.2023.103796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 09/29/2023] [Accepted: 10/03/2023] [Indexed: 10/09/2023]
Abstract
Kinases have a crucial role in regulating almost the full range of cellular processes, making them essential targets for therapeutic interventions against various diseases. Accurate kinase-profiling prediction is vital for addressing the selectivity/specificity challenges in kinase drug discovery, which is closely related to lead optimization, drug repurposing, and the understanding of potential drug side effects. In this review, we provide an overview of the latest advancements in machine learning (ML)-based and deep learning (DL)-based quantitative structure-activity relationship (QSAR) models for kinase profiling. We highlight current trends in this rapidly evolving field and discuss the existing challenges and future directions regarding experimental data set construction and model architecture design. Our aim is to offer practical insights and guidance for the development and utilization of these approaches.
Collapse
Affiliation(s)
- Shukai Gu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co. Ltd, Nanjing 210000, Jiangsu, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| |
Collapse
|
30
|
Song N, Dong R, Pu Y, Wang E, Xu J, Guo F. Pmf-cpi: assessing drug selectivity with a pretrained multi-functional model for compound-protein interactions. J Cheminform 2023; 15:97. [PMID: 37838703 PMCID: PMC10576287 DOI: 10.1186/s13321-023-00767-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 09/28/2023] [Indexed: 10/16/2023] Open
Abstract
Compound-protein interactions (CPI) play significant roles in drug development. To avoid side effects, it is also crucial to evaluate drug selectivity when binding to different targets. However, most selectivity prediction models are constructed for specific targets with limited data. In this study, we present a pretrained multi-functional model for compound-protein interaction prediction (PMF-CPI) and fine-tune it to assess drug selectivity. This model uses recurrent neural networks to process the protein embedding based on the pretrained language model TAPE, extracts molecular information from a graph encoder, and produces the output from dense layers. PMF-CPI obtained the best performance compared to outstanding approaches on both the binding affinity regression and CPI classification tasks. Meanwhile, we apply the model to analyzing drug selectivity after fine-tuning it on three datasets related to specific targets, including human cytochrome P450s. The study shows that PMF-CPI can accurately predict different drug affinities or opposite interactions toward similar targets, recognizing selective drugs for precise therapeutics.Kindly confirm if corresponding authors affiliations are identified correctly and amend if any.Yes, it is correct.
Collapse
Affiliation(s)
- Nan Song
- School of New Media and Communication, Tianjin University, Tianjin, Tianjin, 300072, China
- College of Intelligence and Computing, Tianjin University, Tianjin, Tianjin, 300350, China
| | - Ruihan Dong
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, Beijing, 100871, China
| | - Yuqian Pu
- College of Intelligence and Computing, Tianjin University, Tianjin, Tianjin, 300350, China
| | - Ercheng Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
- Zhejiang Laboratory, Hangzhou, 311100, Zhejiang, China.
| | - Junhai Xu
- School of New Media and Communication, Tianjin University, Tianjin, Tianjin, 300072, China.
- College of Intelligence and Computing, Tianjin University, Tianjin, Tianjin, 300350, China.
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.
| |
Collapse
|
31
|
Huang Y, Huang HY, Chen Y, Lin YCD, Yao L, Lin T, Leng J, Chang Y, Zhang Y, Zhu Z, Ma K, Cheng YN, Lee TY, Huang HD. A Robust Drug-Target Interaction Prediction Framework with Capsule Network and Transfer Learning. Int J Mol Sci 2023; 24:14061. [PMID: 37762364 PMCID: PMC10531393 DOI: 10.3390/ijms241814061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/27/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023] Open
Abstract
Drug-target interactions (DTIs) are considered a crucial component of drug design and drug discovery. To date, many computational methods were developed for drug-target interactions, but they are insufficiently informative for accurately predicting DTIs due to the lack of experimentally verified negative datasets, inaccurate molecular feature representation, and ineffective DTI classifiers. Therefore, we address the limitations of randomly selecting negative DTI data from unknown drug-target pairs by establishing two experimentally validated datasets and propose a capsule network-based framework called CapBM-DTI to capture hierarchical relationships of drugs and targets, which adopts pre-trained bidirectional encoder representations from transformers (BERT) for contextual sequence feature extraction from target proteins through transfer learning and the message-passing neural network (MPNN) for the 2-D graph feature extraction of compounds to accurately and robustly identify drug-target interactions. We compared the performance of CapBM-DTI with state-of-the-art methods using four experimentally validated DTI datasets of different sizes, including human (Homo sapiens) and worm (Caenorhabditis elegans) species datasets, as well as three subsets (new compounds, new proteins, and new pairs). Our results demonstrate that the proposed model achieved robust performance and powerful generalization ability in all experiments. The case study on treating COVID-19 demonstrates the applicability of the model in virtual screening.
Collapse
Affiliation(s)
- Yixian Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Hsi-Yuan Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yigang Chen
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yang-Chi-Dung Lin
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Lantian Yao
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Tianxiu Lin
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Junlin Leng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yuan Chang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yuntian Zhang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Zihao Zhu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Kun Ma
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yeong-Nan Cheng
- Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (Y.-N.C.)
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (Y.-N.C.)
| | - Hsien-Da Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| |
Collapse
|