1
|
Xue Z, Sun C, Zheng W, Lv J, Liu X. TargetSA: adaptive simulated annealing for target-specific drug design. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 41:btae730. [PMID: 39656791 DOI: 10.1093/bioinformatics/btae730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 10/28/2024] [Accepted: 12/02/2024] [Indexed: 12/17/2024]
Abstract
MOTIVATION The burgeoning field of target-specific drug design has attracted considerable attention, focusing on identifying compounds with high binding affinity toward specific target pockets. Nevertheless, existing target-specific deep generative models encounter notable challenges. Some models heavily rely on elaborate datasets and complicated training methodologies, while others neglect the multi-constraint optimization problem inherent in drug design, resulting in generated molecules with irrational structures or chemical properties. RESULTS To address these issues, we propose a novel framework (TargetSA) that leverages adaptive simulated annealing (SA) for target-specific molecular generation and multi-constraint optimization. The SA process explores the discrete structural space of molecules, progressively converging toward the optimal solution that fulfills the predefined objective. To propose novel compounds, we first predict promising editing positions based on historical experience, and then iteratively edit molecular graphs through four operations (insertion, replacement, deletion, and cyclization). Together, these operations collectively constitute a complete operation set, facilitating a thorough exploration of the drug-like space. Furthermore, we introduce a reversible sampling strategy to re-accept currently suboptimal solutions, greatly enhancing the generation quality. Empirical evaluations demonstrate that TargetSA achieves state-of-the-art performance in generating high-affinity molecules (average vina dock -9.09) while maintaining desirable chemical properties. AVAILABILITY AND IMPLEMENTATION https://github.com/XueZhe-Zachary/TargetSA.
Collapse
Affiliation(s)
- Zhe Xue
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Chenwei Sun
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Wenhao Zheng
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Jiancheng Lv
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Xianggen Liu
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
- Laboratory of Anesthesia and Critical Care Medicine, Department of Anesthesiology, Translational Neuroscience Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| |
Collapse
|
2
|
Wang J, Mao J, Li C, Xiang H, Wang X, Wang S, Wang Z, Chen Y, Li Y, No KT, Song T, Zeng X. Interface-aware molecular generative framework for protein-protein interaction modulators. J Cheminform 2024; 16:142. [PMID: 39707457 DOI: 10.1186/s13321-024-00930-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 11/11/2024] [Indexed: 12/23/2024] Open
Abstract
Protein-protein interactions (PPIs) play a crucial role in numerous biochemical and biological processes. Although several structure-based molecular generative models have been developed, PPI interfaces and compounds targeting PPIs exhibit distinct physicochemical properties compared to traditional binding pockets and small-molecule drugs. As a result, generating compounds that effectively target PPIs, particularly by considering PPI complexes or interface hotspot residues, remains a significant challenge. In this work, we constructed a comprehensive dataset of PPI interfaces with active and inactive compound pairs. Based on this, we propose a novel molecular generative framework tailored to PPI interfaces, named GENiPPI. Our evaluation demonstrates that GENiPPI captures the implicit relationships between the PPI interfaces and the active molecules, and can generate novel compounds that target these interfaces. Moreover, GENiPPI can generate structurally diverse novel compounds with limited PPI interface modulators. To the best of our knowledge, this is the first exploration of a structure-based molecular generative model focused on PPI interfaces, which could facilitate the design of PPI modulators. The PPI interface-based molecular generative model enriches the existing landscape of structure-based (pocket/interface) molecular generative model. SCIENTIFIC CONTRIBUTION: This study introduces GENiPPI, a protein-protein interaction (PPI) interface-aware molecular generative framework. The framework first employs Graph Attention Networks to capture atomic-level interaction features at the protein complex interface. Subsequently, Convolutional Neural Networks extract compound representations in voxel and electron density spaces. These features are integrated into a Conditional Wasserstein Generative Adversarial Network, which trains the model to generate compound representations targeting PPI interfaces. GENiPPI effectively captures the relationship between PPI interfaces and active/inactive compounds. Furthermore, in fewshot molecular generation, GENiPPI successfully generates compounds comparable to known disruptors. GENiPPI provides an efficient tool for structure-based design of PPI modulators.
Collapse
Affiliation(s)
- Jianmin Wang
- Department of Integrative Biotechnology, Yonsei University, Incheon, 21983, Republic of Korea
| | - Jiashun Mao
- Department of Integrative Biotechnology, Yonsei University, Incheon, 21983, Republic of Korea
| | - Chunyan Li
- School of Informatics, Yunnan Normal University, Kunming, China
| | - Hongxin Xiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Xun Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, Shandong, China
- High Performance Computer Research Center, University of Chinese Academy of Sciences, Beijing, 100190, China
| | - Shuang Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, Shandong, China
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Yangyang Chen
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, China
| | - Kyoung Tai No
- Department of Integrative Biotechnology, Yonsei University, Incheon, 21983, Republic of Korea.
| | - Tao Song
- School of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, Shandong, China.
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
3
|
Yadav MK, Dahiya V, Tripathi MK, Chaturvedi N, Rashmi M, Ghosh A, Raj VS. Unleashing the future: The revolutionary role of machine learning and artificial intelligence in drug discovery. Eur J Pharmacol 2024; 985:177103. [PMID: 39515559 DOI: 10.1016/j.ejphar.2024.177103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 10/23/2024] [Accepted: 11/05/2024] [Indexed: 11/16/2024]
Abstract
Drug discovery is a complex and multifaceted process aimed at identifying new therapeutic compounds with the potential to treat various diseases. Traditional methods of drug discovery are often time-consuming, expensive, and characterized by low success rates. Because of this, there is an urgent need to improve the drug development process using new technologies. The integration of the current state-of-art of artificial intelligence (AI) and machine learning (ML) approaches with conventional methods will enhance the efficiency and effectiveness of pharmaceutical research. This review highlights the transformative impact of AI and ML in drug discovery, discussing current applications, challenges, and future directions in harnessing these technologies to accelerate the development of innovative therapeutics. We have discussed the latest developments in AI and ML technologies to streamline several stages of drug discovery, from target identification and validation to lead optimization and preclinical studies.
Collapse
Affiliation(s)
- Manoj Kumar Yadav
- Department of Biomedical Engineering, SRM University Delhi-NCR, Sonepat, Haryana, India.
| | - Vandana Dahiya
- Department of Biomedical Engineering, SRM University Delhi-NCR, Sonepat, Haryana, India
| | | | - Navaneet Chaturvedi
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Mayank Rashmi
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Arabinda Ghosh
- Department of Molecular Biology and Bioinformatics, Tripura University, Suryamaninagar, Tripura, India
| | - V Samuel Raj
- Center for Drug Design Discovery and Development (C4D), SRM University Delhi-NCR, Sonepat, Haryana, India.
| |
Collapse
|
4
|
Gu C, Jang WD, Oh KS, Ryu JY. AnoChem: Prediction of chemical structural abnormalities based on machine learning models. Comput Struct Biotechnol J 2024; 23:2116-2121. [PMID: 38808129 PMCID: PMC11130677 DOI: 10.1016/j.csbj.2024.05.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/08/2024] [Accepted: 05/08/2024] [Indexed: 05/30/2024] Open
Abstract
De novo drug design aims to rationally discover novel and potent compounds while reducing experimental costs during the drug development stage. Despite the numerous generative models that have been developed, few successful cases of drug design utilizing generative models have been reported. One of the most common challenges is designing compounds that are not synthesizable or realistic. Therefore, methods capable of accurately assessing the chemical structures proposed by generative models for drug design are needed. In this study, we present AnoChem, a computational framework based on deep learning designed to assess the likelihood of a generated molecule being real. AnoChem achieves an area under the receiver operating characteristic curve score of 0.900 for distinguishing between real and generated molecules. We utilized AnoChem to evaluate and compare the performances of several generative models, using other metrics, namely SAscore and Fréschet ChemNet distance (FCD). AnoChem demonstrates a strong correlation with these metrics, validating its effectiveness as a reliable tool for assessing generative models. The source code for AnoChem is available at https://github.com/CSB-L/AnoChem.
Collapse
Affiliation(s)
- Changdai Gu
- Artificial Intelligence Laboratory, Oncocross Co., Ltd., Saechang-ro, Mapo-gu, Seoul 04168, Republic of Korea
- Department of Artificial Intelligence, College of Computing, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea
| | - Woo Dae Jang
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, 141 Gajeong-ro, Yuseong-gu, Daejeon 34114, Republic of Korea
- Department of Medicinal and Pharmaceutical Chemistry, University of Science and Technology, Daejeon 34129, Republic of Korea
| | - Kwang-Seok Oh
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, 141 Gajeong-ro, Yuseong-gu, Daejeon 34114, Republic of Korea
- Department of Medicinal and Pharmaceutical Chemistry, University of Science and Technology, Daejeon 34129, Republic of Korea
| | - Jae Yong Ryu
- Artificial Intelligence Laboratory, Oncocross Co., Ltd., Saechang-ro, Mapo-gu, Seoul 04168, Republic of Korea
- Department of Biotechnology, Duksung Women’s University, 33 Samyang-Ro 144-Gil, Dobong-gu, Seoul 01369, Republic of Korea
| |
Collapse
|
5
|
Kint S, Dolfsma W, Robinson D. Strategic partnerships for AI-driven drug discovery: The role of relational dynamics. Drug Discov Today 2024; 29:104242. [PMID: 39547391 DOI: 10.1016/j.drudis.2024.104242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 11/01/2024] [Accepted: 11/07/2024] [Indexed: 11/17/2024]
Abstract
Artificial intelligence-driven drug discovery (AIDD) companies hold significant promise for transforming pharmaceutical development, yet little is known about how they manage partnerships with established pharmaceutical firms. To address this research gap, our study explores how AIDD companies develop and leverage relational capabilities to enhance collaboration effectiveness. Through a case study approach, we focus on four key relational aspects: identifying complementary capabilities, establishing effective governance mechanisms, creating relationship-specific assets, and developing interfirm knowledge-sharing routines. Our findings demonstrate that particularly effective governance of intellectual property is essential for partnership success. We offer actionable recommendations for AIDD companies to strengthen collaborations, thereby contributing to the realization of AI's potential in drug discovery.
Collapse
Affiliation(s)
- Stefan Kint
- Wageningen University & Research, Wageningen, the Netherlands
| | - Wilfred Dolfsma
- Wageningen University & Research, Wageningen, the Netherlands.
| | | |
Collapse
|
6
|
Cao PY, He Y, Cui MY, Zhang XM, Zhang Q, Zhang HY. Group graph: a molecular graph representation with enhanced performance, efficiency and interpretability. J Cheminform 2024; 16:133. [PMID: 39609909 PMCID: PMC11606038 DOI: 10.1186/s13321-024-00933-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 11/15/2024] [Indexed: 11/30/2024] Open
Abstract
The exploration of chemical space holds promise for developing influential chemical entities. Molecular representations, which reflect features of molecular structure in silico, assist in navigating chemical space appropriately. Unlike atom-level molecular representations, such as SMILES and atom graph, which can sometimes lead to confusing interpretations about chemical substructures, substructure-level molecular representations encode important substructures into molecular features; they not only provide more information for predicting molecular properties and drug‒drug interactions but also help to interpret the correlations between molecular properties and substructures. However, it remains challenging to represent the entire molecular structure both intactly and simply with substructure-level molecular representations. In this study, we developed a novel substructure-level molecular representation and named it a group graph. The group graph offers three advantages: (a) the substructure of the group graph reflects the diversity and consistency of different molecular datasets; (b) the group graph retains molecular structural features with minimal information loss because the graph isomorphism network (GIN) of the group graph performs well in molecular properties and drug‒drug interactions prediction, showing higher accuracy and efficiency than the model of other molecular graphs, even without any pretraining; and (c) the molecular property may change when the substructure is substituted with another of differing importance in group graph, facilitating the detection of activity cliffs. In addition, we successfully predicted structural modifications to improve blood‒brain barrier permeability (BBBP) via the GIN of group graph. Therefore, the group graph takes advantages for simultaneously representing molecular local characteristics and global features.Scientific contribution The group graph, as a substructure-level molecular representation, has the ability to retain molecular structural features with minimal information loss. As a result, it shows superior performance in predicting molecular properties and drug‒drug interactions with enhanced efficiency and interpretability.
Collapse
Affiliation(s)
- Piao-Yang Cao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Yang He
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Ming-Yang Cui
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Xiao-Min Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Qingye Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China.
| |
Collapse
|
7
|
Bernatavicius A, Šícho M, Janssen APA, Hassen AK, Preuss M, van Westen GJP. AlphaFold Meets De Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models. J Chem Inf Model 2024; 64:8113-8122. [PMID: 39475544 PMCID: PMC11558674 DOI: 10.1021/acs.jcim.4c00309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 10/15/2024] [Accepted: 10/15/2024] [Indexed: 11/12/2024]
Abstract
Recent advancements in deep learning and generative models have significantly expanded the applications of virtual screening for drug-like compounds. Here, we introduce a multitarget transformer model, PCMol, that leverages the latent protein embeddings derived from AlphaFold2 as a means of conditioning a de novo generative model on different targets. Incorporating rich protein representations allows the model to capture their structural relationships, enabling the chemical space interpolation of active compounds and target-side generalization to new proteins based on embedding similarities. In this work, we benchmark against other existing target-conditioned transformer models to illustrate the validity of using AlphaFold protein representations over raw amino acid sequences. We show that low-dimensional projections of these protein embeddings cluster appropriately based on target families and that model performance declines when these representations are intentionally corrupted. We also show that the PCMol model generates diverse, potentially active molecules for a wide array of proteins, including those with sparse ligand bioactivity data. The generated compounds display higher similarity known active ligands of held-out targets and have comparable molecular docking scores while maintaining novelty. Additionally, we demonstrate the important role of data augmentation in bolstering the performance of generative models in low-data regimes. Software package and AlphaFold protein embeddings are freely available at https://github.com/CDDLeiden/PCMol.
Collapse
Affiliation(s)
- Andrius Bernatavicius
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
- Leiden
Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
| | - Martin Šícho
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
- CZ-OPENSCREEN:
National Infrastructure for Chemical Biology, Department of Informatics
and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech
Republic
| | - Antonius P. A. Janssen
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
- Leiden
Institute of Chemistry, Leiden University, Einsteinweg 55, 2333CC Leiden, The
Netherlands
| | - Alan Kai Hassen
- Leiden
Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
| | - Mike Preuss
- Leiden
Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
| | - Gerard J. P. van Westen
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
| |
Collapse
|
8
|
Xu C, Zheng L, Fan Q, Liu Y, Zeng C, Ning X, Liu H, Du K, Lu T, Chen Y, Zhang Y. Progress in the application of artificial intelligence in molecular generation models based on protein structure. Eur J Med Chem 2024; 277:116735. [PMID: 39098131 DOI: 10.1016/j.ejmech.2024.116735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 07/12/2024] [Accepted: 07/30/2024] [Indexed: 08/06/2024]
Abstract
The molecular generation models based on protein structures represent a cutting-edge research direction in artificial intelligence-assisted drug discovery. This article aims to comprehensively summarize the research methods and developments by analyzing a series of novel molecular generation models predicated on protein structures. Initially, we categorize the molecular generation models based on protein structures and highlight the architectural frameworks utilized in these models. Subsequently, we detail the design and implementation of protein structure-based molecular generation models by introducing different specific examples. Lastly, we outline the current opportunities and challenges encountered in this field, intending to offer guidance and a referential framework for developing and studying new models in related fields in the future.
Collapse
Affiliation(s)
- Chengcheng Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Lidan Zheng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Qing Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yingxu Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Chen Zeng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Xiangzhen Ning
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Ke Du
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China; State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| |
Collapse
|
9
|
Li P, Zhang K, Liu T, Lu R, Chen Y, Yao X, Gao L, Zeng X. A deep learning approach for rational ligand generation with toxicity control via reactive building blocks. NATURE COMPUTATIONAL SCIENCE 2024; 4:851-864. [PMID: 39516375 DOI: 10.1038/s43588-024-00718-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 10/07/2024] [Indexed: 11/16/2024]
Abstract
Deep generative models are gaining attention in the field of de novo drug design. However, the rational design of ligand molecules for novel targets remains challenging, particularly in controlling the properties of the generated molecules. Here, inspired by the DNA-encoded compound library technique, we introduce DeepBlock, a deep learning approach for block-based ligand generation tailored to target protein sequences while enabling precise property control. DeepBlock neatly divides the generation process into two steps: building blocks generation and molecule reconstruction, accomplished by a neural network and a rule-based reconstruction algorithm we proposed, respectively. Furthermore, DeepBlock synergizes the optimization algorithm and deep learning to regulate the properties of the generated molecules. Experiments show that DeepBlock outperforms existing methods in generating ligands with affinity, synthetic accessibility and drug likeness. Moreover, when integrated with simulated annealing or Bayesian optimization using toxicity as the optimization objective, DeepBlock successfully generates ligands with low toxicity while preserving affinity with the target.
Collapse
Affiliation(s)
- Pengyong Li
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Kaihao Zhang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Tianxiao Liu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Ruiqiang Lu
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | | | - Xiaojun Yao
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| |
Collapse
|
10
|
Maziarz K, Tripp A, Liu G, Stanley M, Xie S, Gaiński P, Seidl P, Segler MHS. Re-evaluating retrosynthesis algorithms with Syntheseus. Faraday Discuss 2024. [PMID: 39485491 DOI: 10.1039/d4fd00093e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Automated synthesis planning has recently re-emerged as a research area at the intersection of chemistry and machine learning. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques, and unnecessarily hamper progress. To remedy this, we present a synthesis planning library with an extensive benchmarking framework, called SYNTHESEUS, which promotes best practice by default, enabling consistent meaningful evaluation of single-step and multi-step synthesis planning algorithms. We demonstrate the capabilities of SYNTHESEUS by re-evaluating several previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes in controlled evaluation experiments. We end with guidance for future works in this area, and call on the community to engage in the discussion on how to improve benchmarks for synthesis planning.
Collapse
|
11
|
Durant G, Boyles F, Birchall K, Deane CM. The future of machine learning for small-molecule drug discovery will be driven by data. NATURE COMPUTATIONAL SCIENCE 2024; 4:735-743. [PMID: 39407003 DOI: 10.1038/s43588-024-00699-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 09/03/2024] [Indexed: 10/25/2024]
Abstract
Many studies have prophesied that the integration of machine learning techniques into small-molecule therapeutics development will help to deliver a true leap forward in drug discovery. However, increasingly advanced algorithms and novel architectures have not always yielded substantial improvements in results. In this Perspective, we propose that a greater focus on the data for training and benchmarking these models is more likely to drive future improvement, and explore avenues for future research and strategies to address these data challenges.
Collapse
Affiliation(s)
- Guy Durant
- Department of Statistics, University of Oxford, Oxford, UK
| | - Fergus Boyles
- Department of Statistics, University of Oxford, Oxford, UK
| | | | | |
Collapse
|
12
|
Roucairol M, Georgiou A, Cazenave T, Prischi F, Pardo OE. DrugSynthMC: An Atom-Based Generation of Drug-like Molecules with Monte Carlo Search. J Chem Inf Model 2024; 64:7097-7107. [PMID: 39249497 PMCID: PMC11423341 DOI: 10.1021/acs.jcim.4c01451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/10/2024]
Abstract
A growing number of deep learning (DL) methodologies have recently been developed to design novel compounds and expand the chemical space within virtual libraries. Most of these neural network approaches design molecules to specifically bind a target based on its structural information and/or knowledge of previously identified binders. Fewer attempts have been made to develop approaches for de novo design of virtual libraries, as synthesizability of generated molecules remains a challenge. In this work, we developed a new Monte Carlo Search (MCS) algorithm, DrugSynthMC (Drug Synthesis using Monte Carlo), in conjunction with DL and statistical-based priors to generate thousands of interpretable chemical structures and novel drug-like molecules per second. DrugSynthMC produces drug-like compounds using an atom-based search model that builds molecules as SMILES, character by character. Designed molecules follow Lipinski's "rule of 5″, show a high proportion of highly water-soluble nontoxic predicted-to-be synthesizable compounds, and efficiently expand the chemical space within the libraries, without reliance on training data sets, synthesizability metrics, or enforcing during SMILES generation. Our approach can function with or without an underlying neural network and is thus easily explainable and versatile. This ease in drug-like molecule generation allows for future integration of score functions aimed at different target- or job-oriented goals. Thus, DrugSynthMC is expected to enable the functional assessment of large compound libraries covering an extensive novel chemical space, overcoming the limitations of existing drug collections. The software is available at https://github.com/RoucairolMilo/DrugSynthMC.
Collapse
Affiliation(s)
- Milo Roucairol
- LAMSADE, Université Paris-Dauphine, Pl. du Maréchal de Lattre de Tassigny, 75016 Paris, France
| | - Alexios Georgiou
- LAMSADE, Université Paris-Dauphine, Pl. du Maréchal de Lattre de Tassigny, 75016 Paris, France
| | - Tristan Cazenave
- LAMSADE, Université Paris-Dauphine, Pl. du Maréchal de Lattre de Tassigny, 75016 Paris, France
| | - Filippo Prischi
- Randall Centre for Cell and Molecular Biophysics, School of Basic and Medical Biosciences, King's College London, London SE1 1UL, United Kingdom
| | - Olivier E Pardo
- Division of Cancer, Department of Surgery and Cancer, Imperial College, Du Cane Road, London W12 0NN, United Kingdom
| |
Collapse
|
13
|
Salvadó O, Pérez-Ruíz J, Mesas A, Díaz-Requejo MM, Pérez PJ, Fernández E. Rare Gold-Catalyzed 4- exo- dig Cyclization for Ring Expansion of Propargylic Aziridines toward Stereoselective ( Z)-Alkylidene Azetidines, via Diborylalkyl Homopropargyl Amines. Org Lett 2024; 26:7535-7540. [PMID: 39219538 PMCID: PMC11406573 DOI: 10.1021/acs.orglett.4c02415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
We report an uncommon 4-exo-dig cyclization of N-tosyl homopropargyl amines, catalyzed by [AuCl(PEt3)]/AgOTf, to prepare stereoselective (Z)-2-alkylidene-1-tosylazetidine compounds. The reaction outcome contrasts with the gold-catalyzed cyclization of N-tosyl homopropargyl amines containing a methyl group at the propargylic position that provides substituted 2,3-dihydropyrroles via a 5-endo-dig mechanism. The access to N-tosyl homopropargyl amines is possible by the regioselective nucleophilic attack of α-diboryl alkylidene lithium salts to propargylic aziridines.
Collapse
Affiliation(s)
- Oriol Salvadó
- Faculty of Chemistry, University Rovira i Virgili, 43007 Tarragona, Spain
| | - Jorge Pérez-Ruíz
- Laboratorio de Catálisis Homogénea, Unidad Asociada al CSIC, Centro de Investigación en Química Sostenible (CIQSO) and Departamento de Química, Universidad de Huelva, 21007 Huelva, Spain
| | - Alba Mesas
- Faculty of Chemistry, University Rovira i Virgili, 43007 Tarragona, Spain
| | - M Mar Díaz-Requejo
- Laboratorio de Catálisis Homogénea, Unidad Asociada al CSIC, Centro de Investigación en Química Sostenible (CIQSO) and Departamento de Química, Universidad de Huelva, 21007 Huelva, Spain
| | - Pedro J Pérez
- Laboratorio de Catálisis Homogénea, Unidad Asociada al CSIC, Centro de Investigación en Química Sostenible (CIQSO) and Departamento de Química, Universidad de Huelva, 21007 Huelva, Spain
| | - Elena Fernández
- Faculty of Chemistry, University Rovira i Virgili, 43007 Tarragona, Spain
| |
Collapse
|
14
|
Kneiding H, Balcells D. Augmenting genetic algorithms with machine learning for inverse molecular design. Chem Sci 2024:d4sc02934h. [PMID: 39296997 PMCID: PMC11404003 DOI: 10.1039/d4sc02934h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/09/2024] [Indexed: 09/21/2024] Open
Abstract
Evolutionary and machine learning methods have been successfully applied to the generation of molecules and materials exhibiting desired properties. The combination of these two paradigms in inverse design tasks can yield powerful methods that explore massive chemical spaces more efficiently, improving the quality of the generated compounds. However, such synergistic approaches are still an incipient area of research and appear underexplored in the literature. This perspective covers different ways of incorporating machine learning approaches into evolutionary learning frameworks, with the overall goal of increasing the optimization efficiency of genetic algorithms. In particular, machine learning surrogate models for faster fitness function evaluation, discriminator models to control population diversity on-the-fly, machine learning based crossover operations, and evolution in latent space are discussed. The further potential of these synergistic approaches in generative tasks is also assessed, outlining promising directions for future developments.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| |
Collapse
|
15
|
Bhattacharya D, Cassady HJ, Hickner MA, Reinhart WF. Large Language Models as Molecular Design Engines. J Chem Inf Model 2024. [PMID: 39231030 DOI: 10.1021/acs.jcim.4c01396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
The design of small molecules is crucial for technological applications ranging from drug discovery to energy storage. Due to the vast design space available to modern synthetic chemistry, the community has increasingly sought to use data-driven and machine learning approaches to navigate this space. Although generative machine learning methods have recently shown potential for computational molecular design, their use is hindered by complex training procedures, and they often fail to generate valid and unique molecules. In this context, pretrained Large Language Models (LLMs) have emerged as potential tools for molecular design, as they appear to be capable of creating and modifying molecules based on simple instructions provided through natural language prompts. In this work, we show that the Claude 3 Opus LLM can read, write, and modify molecules according to prompts, with impressive 97% valid and unique molecules. By quantifying these modifications in a low-dimensional latent space, we systematically evaluate the model's behavior under different prompting conditions. Notably, the model is able to perform guided molecular generation when asked to manipulate the electronic structure of molecules using simple, natural-language prompts. Our findings highlight the potential of LLMs as powerful and versatile molecular design engines.
Collapse
Affiliation(s)
- Debjyoti Bhattacharya
- Materials Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - Harrison J Cassady
- Department of Chemical Engineering and Material Science, Michigan State University, East Lansing, Michigan 48824, United States
| | - Michael A Hickner
- Department of Chemical Engineering and Material Science, Michigan State University, East Lansing, Michigan 48824, United States
| | - Wesley F Reinhart
- Materials Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, United States
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| |
Collapse
|
16
|
Pala D, Clark DE. Caught between a ROCK and a hard place: current challenges in structure-based drug design. Drug Discov Today 2024; 29:104106. [PMID: 39029868 DOI: 10.1016/j.drudis.2024.104106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 06/27/2024] [Accepted: 07/13/2024] [Indexed: 07/21/2024]
Abstract
The discipline of structure-based drug design (SBDD) is several decades old and it is tempting to think that the proliferation of experimental structures for many drug targets might make computer-aided drug design (CADD) straightforward. However, this is far from true. In this review, we illustrate some of the challenges that CADD scientists face every day in their work, even now. We use Rho-associated protein kinase (ROCK), and public domain structures and data, as an example to illustrate some of the challenges we have experienced during our project targeting this protein. We hope that this will help to prevent unrealistic expectations of what CADD can accomplish and to educate non-CADD scientists regarding the challenges still facing their CADD colleagues.
Collapse
Affiliation(s)
- Daniele Pala
- Medicinal Chemistry and Drug Design Technologies Department, Chiesi Farmaceutici S.p.A, Research Center, Largo Belloli 11/a, 43122 Parma, Italy
| | - David E Clark
- Charles River, 6-9 Spire Green Centre, Flex Meadow, Harlow CM19 5TR, UK.
| |
Collapse
|
17
|
Sultan A, Sieg J, Mathea M, Volkamer A. Transformers for Molecular Property Prediction: Lessons Learned from the Past Five Years. J Chem Inf Model 2024; 64:6259-6280. [PMID: 39136669 DOI: 10.1021/acs.jcim.4c00747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pretraining data, optimal architecture selections, and promising pretraining objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.
Collapse
Affiliation(s)
- Afnan Sultan
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| | | | | | - Andrea Volkamer
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| |
Collapse
|
18
|
Menke J, Nahal Y, Bjerrum EJ, Kabeshov M, Kaski S, Engkvist O. Metis: a python-based user interface to collect expert feedback for generative chemistry models. J Cheminform 2024; 16:100. [PMID: 39143631 PMCID: PMC11323385 DOI: 10.1186/s13321-024-00892-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 08/02/2024] [Indexed: 08/16/2024] Open
Abstract
One challenge that current de novo drug design models face is a disparity between the user's expectations and the actual output of the model in practical applications. Tailoring models to better align with chemists' implicit knowledge, expectation and preferences is key to overcoming this obstacle effectively. While interest in preference-based and human-in-the-loop machine learning in chemistry is continuously increasing, no tool currently exists that enables the collection of standardized and chemistry-specific feedback. Metis is a Python-based open-source graphical user interface (GUI), designed to solve this and enable the collection of chemists' detailed feedback on molecular structures. The GUI enables chemists to explore and evaluate molecules, offering a user-friendly interface for annotating preferences and specifying desired or undesired structural features. By providing chemists the opportunity to give detailed feedback, allows researchers to capture more efficiently the chemist's implicit knowledge and preferences. This knowledge is crucial to align the chemist's idea with the de novo design agents. The GUI aims to enhance this collaboration between the human and the "machine" by providing an intuitive platform where chemists can interactively provide feedback on molecular structures, aiding in preference learning and refining de novo design strategies. Metis integrates with the existing de novo framework REINVENT, creating a closed-loop system where human expertise can continuously inform and refine the generative models.Scientific contributionWe introduce a novel Graphical User Interface, that allows chemists/researchers to give detailed feedback on substructures and properties of small molecules. This tool can be used to learn the preferences of chemists in order to align de novo drug design models with the chemist's ideas. The GUI can be customized to fit different needs and projects and enables direct integration into de novo REINVENT runs. We believe that Metis can facilitate the discussion and development of novel ways to integrate human feedback that goes beyond binary decisions of liking or disliking a molecule.
Collapse
Affiliation(s)
- Janosch Menke
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, 41296, Sweden.
| | - Yasmine Nahal
- Department of Computer Science, Aalto University, Espoo, 02150, Finland
| | | | - Mikhail Kabeshov
- Molecular AI, Discovery Sciences AstraZeneca R &D, Mölndal, 43183, Sweden
| | - Samuel Kaski
- Department of Computer Science, Aalto University, Espoo, 02150, Finland
- Department of Computer Science, University of Manchester, Manchester, M13 9PL, UK
| | - Ola Engkvist
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, 41296, Sweden
- Molecular AI, Discovery Sciences AstraZeneca R &D, Mölndal, 43183, Sweden
| |
Collapse
|
19
|
Chen S, Xie J, Ye R, Xu DD, Yang Y. Structure-aware dual-target drug design through collaborative learning of pharmacophore combination and molecular simulation. Chem Sci 2024; 15:10366-10380. [PMID: 38994407 PMCID: PMC11234869 DOI: 10.1039/d4sc00094c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 06/09/2024] [Indexed: 07/13/2024] Open
Abstract
Dual-target drug design has gained significant attention in the treatment of complex diseases, such as cancers and autoimmune disorders. A widely employed design strategy is combining pharmacophores to leverage the knowledge of structure-activity relationships of both targets. Unfortunately, pharmacophore combination often struggles with long and expensive trial and error, because the protein pockets of the two targets impose complex structural constraints. In this study, we propose AIxFuse, a structure-aware dual-target drug design method that learns pharmacophore fusion patterns to satisfy the dual-target structural constraints simulated by molecular docking. AIxFuse employs two self-play reinforcement learning (RL) agents to learn pharmacophore selection and fusion by comprehensive feedback including dual-target molecular docking scores. Collaboratively, the molecular docking scores are learned by active learning (AL). Through collaborative RL and AL, AIxFuse learns to generate molecules with multiple desired properties. AIxFuse is shown to outperform state-of-the-art methods in generating dual-target drugs against glycogen synthase kinase-3 beta (GSK3β) and c-Jun N-terminal kinase 3 (JNK3). When applied to another task against retinoic acid receptor-related orphan receptor γ-t (RORγt) and dihydroorotate dehydrogenase (DHODH), AIxFuse exhibits consistent performance while compared methods suffer from performance drops, leading to a 5 times higher performance in success rate. Docking studies demonstrate that AIxFuse can generate molecules concurrently satisfying the binding mode required by both targets. Further free energy perturbation calculation indicates that the generated candidates have promising binding free energies against both targets.
Collapse
Affiliation(s)
- Sheng Chen
- School of Computer Science and Engineering, Sun Yat-sen University Guangzhou 510006 China
- AixplorerBio Inc. Jiaxing 314031 China
| | - Junjie Xie
- School of Computer Science and Engineering, Sun Yat-sen University Guangzhou 510006 China
- AixplorerBio Inc. Jiaxing 314031 China
| | | | | | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University Guangzhou 510006 China
| |
Collapse
|
20
|
An Y, Lim J, Glavatskikh M, Wang X, Norris-Drouin J, Hardy PB, Leisner TM, Pearce KH, Kireev D. In silico fragment-based discovery of CIB1-directed anti-tumor agents by FRASE-bot. Nat Commun 2024; 15:5564. [PMID: 38956119 PMCID: PMC11219766 DOI: 10.1038/s41467-024-49892-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 06/19/2024] [Indexed: 07/04/2024] Open
Abstract
Chemical probes are an indispensable tool for translating biological discoveries into new therapies, though are increasingly difficult to identify since novel therapeutic targets are often hard-to-drug proteins. We introduce FRASE-based hit-finding robot (FRASE-bot), to expedite drug discovery for unconventional therapeutic targets. FRASE-bot mines available 3D structures of ligand-protein complexes to create a database of FRAgments in Structural Environments (FRASE). The FRASE database can be screened to identify structural environments similar to those in the target protein and seed the target structure with relevant ligand fragments. A neural network model is used to retain fragments with the highest likelihood of being native binders. The seeded fragments then inform ultra-large-scale virtual screening of commercially available compounds. We apply FRASE-bot to identify ligands for Calcium and Integrin Binding protein 1 (CIB1), a promising drug target implicated in triple negative breast cancer. FRASE-based virtual screening identifies a small-molecule CIB1 ligand (with binding confirmed in a TR-FRET assay) showing specific cell-killing activity in CIB1-dependent cancer cells, but not in CIB1-depletion-insensitive cells.
Collapse
Affiliation(s)
- Yi An
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Jiwoong Lim
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Marta Glavatskikh
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Xiaowen Wang
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211, USA
| | - Jacqueline Norris-Drouin
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - P Brian Hardy
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Tina M Leisner
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Kenneth H Pearce
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA.
| | - Dmitri Kireev
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA.
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211, USA.
| |
Collapse
|
21
|
Saunders A, Harrington PDB. Advances in Activity/Property Prediction from Chemical Structures. Crit Rev Anal Chem 2024; 54:135-147. [PMID: 35482792 DOI: 10.1080/10408347.2022.2066461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Recent technological advancement in AI modeling of molecular property databases has significantly expanded the opportunities for drug design and development. Quantitative structure-activity relationships (QSARs) are shown to provide more accurate predictions with regards to biological activity as well as toxicological assessment. By using a combination of in-silico models or by combining disparate structure-activity databases, researchers have been able to improve accuracy for a variety of drug discovery and analysis methods, generating viable compounds, which in certain cases, can be synthesized and further studied in vitro to find candidates for potential development. Additionally, the development of compounds of determined toxicology can be discontinued earlier, allowing alternative routes to be evaluated, preventing wasted time and resources. Although the progress that has been made is tremendous, expert review is still necessary for most in-silico generated predictions. Regardless, the scientific community continues to move ever closer to completely automated drug discovery and evaluation.
Collapse
Affiliation(s)
- Arianne Saunders
- Department of Chemistry and Biochemistry, Ohio University, Athens, Ohio, USA
| | | |
Collapse
|
22
|
Ren X, Wei J, Luo X, Liu Y, Li K, Zhang Q, Gao X, Yan S, Wu X, Jiang X, Liu M, Cao D, Wei L, Zeng X, Shi J. HydrogelFinder: A Foundation Model for Efficient Self-Assembling Peptide Discovery Guided by Non-Peptidal Small Molecules. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400829. [PMID: 38704695 PMCID: PMC11234452 DOI: 10.1002/advs.202400829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/10/2024] [Indexed: 05/07/2024]
Abstract
Self-assembling peptides have numerous applications in medicine, food chemistry, and nanotechnology. However, their discovery has traditionally been serendipitous rather than driven by rational design. Here, HydrogelFinder, a foundation model is developed for the rational design of self-assembling peptides from scratch. This model explores the self-assembly properties by molecular structure, leveraging 1,377 self-assembling non-peptidal small molecules to navigate chemical space and improve structural diversity. Utilizing HydrogelFinder, 111 peptide candidates are generated and synthesized 17 peptides, subsequently experimentally validating the self-assembly and biophysical characteristics of nine peptides ranging from 1-10 amino acids-all achieved within a 19-day workflow. Notably, the two de novo-designed self-assembling peptides demonstrated low cytotoxicity and biocompatibility, as confirmed by live/dead assays. This work highlights the capacity of HydrogelFinder to diversify the design of self-assembling peptides through non-peptidal small molecules, offering a powerful toolkit and paradigm for future peptide discovery endeavors.
Collapse
Affiliation(s)
- Xuanbai Ren
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Jiaying Wei
- State Key Laboratory of Chemo/Bio‐Sensing and Chemometrics, School of Biomedical SciencesHunan UniversityChangsha410003China
| | - Xiaoli Luo
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Yuansheng Liu
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Kenli Li
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Qiang Zhang
- ZJU‐Hangzhou Global Scientific and Technological Innovation CenterHangzhou311200China
- College of Computer Science and TechnologyZhejiang UniversityHangzhou310013China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering DivisionKing Abdullah University of Science and Technology (KAUST)Thuwal23955‐6900Saudi Arabia
| | - Sizhe Yan
- State Key Laboratory of Chemo/Bio‐Sensing and Chemometrics, School of Biomedical SciencesHunan UniversityChangsha410003China
| | - Xia Wu
- State Key Laboratory of Chemo/Bio‐Sensing and Chemometrics, School of Biomedical SciencesHunan UniversityChangsha410003China
| | - Xingyue Jiang
- State Key Laboratory of Chemo/Bio‐Sensing and Chemometrics, School of Biomedical SciencesHunan UniversityChangsha410003China
| | - Mingquan Liu
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical SciencesCentral South UniversityChangsha410003China
| | - Leyi Wei
- School of SoftwareShandong UniversityJinan250100China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinan250100China
| | - Xiangxiang Zeng
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Junfeng Shi
- State Key Laboratory of Chemo/Bio‐Sensing and Chemometrics, School of Biomedical SciencesHunan UniversityChangsha410003China
| |
Collapse
|
23
|
Jiang X, Lu L, Li J, Jiang J, Zhang J, Zhou S, Wen H, Cai H, Luo X, Li Z, Wang J, Ju B, Bai R. Synthetically Feasible De Novo Molecular Design of Leads Based on a Reinforcement Learning Model: AI-Assisted Discovery of an Anti-IBD Lead Targeting CXCR4. J Med Chem 2024; 67:10057-10075. [PMID: 38863440 DOI: 10.1021/acs.jmedchem.4c00184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2024]
Abstract
Artificial intelligence (AI) de novo molecular generation provides leads with novel structures for drug discovery. However, the target affinity and synthesizability of the generated molecules present critical challenges for the successful application of AI technology. Therefore, we developed an advanced reinforcement learning model to bridge the gap between the theory of de novo molecular generation and the practical aspects of drug discovery. This model utilizes chemical reaction templates and commercially available building blocks as a starting point and employs forward reaction prediction to generate molecules, while real-time docking and drug-likeness predictions are conducted to ensure synthesizability and drug-likeness. We applied this model to design active molecules targeting the inflammation-related receptor CXCR4 and successfully prepared them according to the AI-proposed synthetic routes. Several molecules exhibited potent anti-CXCR4 and anti-inflammatory activity in subsequent in vitro and in vivo assays. The top-performing compound XVI alleviated symptoms related to inflammatory bowel disease and showed reasonable pharmacokinetic properties.
Collapse
Affiliation(s)
- Xiaoying Jiang
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Liuxin Lu
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Junjie Li
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Jing Jiang
- SanOmics AI Co. Ltd., Hangzhou 311103, PR China
| | - Jiapeng Zhang
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, PR China
| | - Shengbin Zhou
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, PR China
| | - Hao Wen
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Hong Cai
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Xinyu Luo
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Zhen Li
- SanOmics AI Co. Ltd., Hangzhou 311103, PR China
| | - Jiahui Wang
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Bin Ju
- SanOmics AI Co. Ltd., Hangzhou 311103, PR China
| | - Renren Bai
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| |
Collapse
|
24
|
Guo J, Schwaller P. Augmented Memory: Sample-Efficient Generative Molecular Design with Reinforcement Learning. JACS AU 2024; 4:2160-2172. [PMID: 38938817 PMCID: PMC11200228 DOI: 10.1021/jacsau.4c00066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/29/2024] [Accepted: 04/01/2024] [Indexed: 06/29/2024]
Abstract
Sample efficiency is a fundamental challenge in de novo molecular design. Ideally, molecular generative models should learn to satisfy a desired objective under minimal calls to oracles (computational property predictors). This problem becomes more apparent when using oracles that can provide increased predictive accuracy but impose significant computational cost. Consequently, designing molecules that are optimized for such oracles cannot be achieved under a practical computational budget. Molecular generative models based on simplified molecular-input line-entry system (SMILES) have shown remarkable sample efficiency when coupled with reinforcement learning, as demonstrated in the practical molecular optimization (PMO) benchmark. Here, we first show that experience replay drastically improves the performance of multiple previously proposed algorithms. Next, we propose a novel algorithm called Augmented Memory that combines data augmentation with experience replay. We show that scores obtained from oracle calls can be reused to update the model multiple times. We compare Augmented Memory to previously proposed algorithms and show significantly enhanced sample efficiency in an exploitation task, a drug discovery case study requiring both exploration and exploitation, and a materials design case study optimizing explicitly for quantum-mechanical properties. Our method achieves a new state-of-the-art in sample-efficient de novo molecular design, outperforming all of the previously reported methods. The code is available at https://github.com/schwallergroup/augmented_memory.
Collapse
Affiliation(s)
- Jeff Guo
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Philippe Schwaller
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| |
Collapse
|
25
|
Xu X, Xu C, He W, Wei L, Li H, Zhou J, Zhang R, Wang Y, Xiong Y, Gao X. HELM-GPT: de novo macrocyclic peptide design using generative pre-trained transformer. Bioinformatics 2024; 40:btae364. [PMID: 38867692 PMCID: PMC11256930 DOI: 10.1093/bioinformatics/btae364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/08/2024] [Accepted: 06/10/2024] [Indexed: 06/14/2024] Open
Abstract
MOTIVATION Macrocyclic peptides hold great promise as therapeutics targeting intracellular proteins. This stems from their remarkable ability to bind flat protein surfaces with high affinity and specificity while potentially traversing the cell membrane. Research has already explored their use in developing inhibitors for intracellular proteins, such as KRAS, a well-known driver in various cancers. However, computational approaches for de novo macrocyclic peptide design remain largely unexplored. RESULTS Here, we introduce HELM-GPT, a novel method that combines the strength of the hierarchical editing language for macromolecules (HELM) representation and generative pre-trained transformer (GPT) for de novo macrocyclic peptide design. Through reinforcement learning (RL), our experiments demonstrate that HELM-GPT has the ability to generate valid macrocyclic peptides and optimize their properties. Furthermore, we introduce a contrastive preference loss during the RL process, further enhanced the optimization performance. Finally, to co-optimize peptide permeability and KRAS binding affinity, we propose a step-by-step optimization strategy, demonstrating its effectiveness in generating molecules fulfilling both criteria. In conclusion, the HELM-GPT method can be used to identify novel macrocyclic peptides to target intracellular proteins. AVAILABILITY AND IMPLEMENTATION The code and data of HELM-GPT are freely available on GitHub (https://github.com/charlesxu90/helm-gpt).
Collapse
Affiliation(s)
- Xiaopeng Xu
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Chencheng Xu
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Wenjia He
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Lesong Wei
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Haoyang Li
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | | | - Yu Wang
- Syneron Technology, Guangzhou 510000, China
| | | | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| |
Collapse
|
26
|
Alberga D, Lamanna G, Graziano G, Delre P, Lomuscio MC, Corriero N, Ligresti A, Siliqi D, Saviano M, Contino M, Stefanachi A, Mangiatordi GF. DeLA-DrugSelf: Empowering multi-objective de novo design through SELFIES molecular representation. Comput Biol Med 2024; 175:108486. [PMID: 38653065 DOI: 10.1016/j.compbiomed.2024.108486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/08/2024] [Accepted: 04/15/2024] [Indexed: 04/25/2024]
Abstract
In this paper, we introduce DeLA-DrugSelf, an upgraded version of DeLA-Drug [J. Chem. Inf. Model. 62 (2022) 1411-1424], which incorporates essential advancements for automated multi-objective de novo design. Unlike its predecessor, which relies on SMILES notation for molecular representation, DeLA-DrugSelf employs a novel and robust molecular representation string named SELFIES (SELF-referencing Embedded String). The generation process in DeLA-DrugSelf not only involves substitutions to the initial string representing the starting query molecule but also incorporates insertions and deletions. This enhancement makes DeLA-DrugSelf significantly more adept at executing data-driven scaffold decoration and lead optimization strategies. Remarkably, DeLA-DrugSelf explicitly addresses the SELFIES-related collapse issue, considering only collapse-free compounds during generation. These compounds undergo a rigorous quality metrics evaluation, highlighting substantial advancements in terms of drug-likeness, uniqueness, and novelty compared to the molecules generated by the previous version of the algorithm. To evaluate the potential of DeLA-DrugSelf as a mutational operator within a genetic algorithm framework for multi-objective optimization, we employed a fitness function based on Pareto dominance. Our objectives focused on target-oriented properties aimed at optimizing known cannabinoid receptor 2 (CB2R) ligands. The results obtained indicate that DeLA-DrugSelf, available as a user-friendly web platform (https://www.ba.ic.cnr.it/softwareic/delaself/), can effectively contribute to the data-driven optimization of starting bioactive molecules based on user-defined parameters.
Collapse
Affiliation(s)
- Domenico Alberga
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Giuseppe Lamanna
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Giovanni Graziano
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | - Pietro Delre
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | | | - Nicola Corriero
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Alessia Ligresti
- CNR - Institute of Biomolecular Chemistry, Via Campi Flegrei 34, 80078, Pozzuoli, Italy
| | - Dritan Siliqi
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Michele Saviano
- CNR - Institute of Crystallography, Via Vivaldi 43, 81100, Caserta, Italy
| | - Marialessandra Contino
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | - Angela Stefanachi
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | | |
Collapse
|
27
|
Li Y, Liu B, Deng J, Guo Y, Du H. Image-based molecular representation learning for drug development: a survey. Brief Bioinform 2024; 25:bbae294. [PMID: 38920347 PMCID: PMC11200195 DOI: 10.1093/bib/bbae294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/19/2024] [Accepted: 06/08/2024] [Indexed: 06/27/2024] Open
Abstract
Artificial intelligence (AI) powered drug development has received remarkable attention in recent years. It addresses the limitations of traditional experimental methods that are costly and time-consuming. While there have been many surveys attempting to summarize related research, they only focus on general AI or specific aspects such as natural language processing and graph neural network. Considering the rapid advance on computer vision, using the molecular image to enable AI appears to be a more intuitive and effective approach since each chemical substance has a unique visual representation. In this paper, we provide the first survey on image-based molecular representation for drug development. The survey proposes a taxonomy based on the learning paradigms in computer vision and reviews a large number of corresponding papers, highlighting the contributions of molecular visual representation in drug development. Besides, we discuss the applications, limitations and future directions in the field. We hope this survey could offer valuable insight into the use of image-based molecular representation learning in the context of drug development.
Collapse
Affiliation(s)
- Yue Li
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Bingyan Liu
- School of Computer Science, Beijing University of Posts and Telecommunications, No.10 Xituchen Street, 100876, Beijing, China
| | - Jinyan Deng
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Yi Guo
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Hongbo Du
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
- Institute of Liver Disease, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| |
Collapse
|
28
|
Ju W, Fang Z, Gu Y, Liu Z, Long Q, Qiao Z, Qin Y, Shen J, Sun F, Xiao Z, Yang J, Yuan J, Zhao Y, Wang Y, Luo X, Zhang M. A Comprehensive Survey on Deep Graph Representation Learning. Neural Netw 2024; 173:106207. [PMID: 38442651 DOI: 10.1016/j.neunet.2024.106207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 01/23/2024] [Accepted: 02/21/2024] [Indexed: 03/07/2024]
Abstract
Graph representation learning aims to effectively encode high-dimensional sparse graph-structured data into low-dimensional dense vectors, which is a fundamental task that has been widely studied in a range of fields, including machine learning and data mining. Classic graph embedding methods follow the basic idea that the embedding vectors of interconnected nodes in the graph can still maintain a relatively close distance, thereby preserving the structural information between the nodes in the graph. However, this is sub-optimal due to: (i) traditional methods have limited model capacity which limits the learning performance; (ii) existing techniques typically rely on unsupervised learning strategies and fail to couple with the latest learning paradigms; (iii) representation learning and downstream tasks are dependent on each other which should be jointly enhanced. With the remarkable success of deep learning, deep graph representation learning has shown great potential and advantages over shallow (traditional) methods, there exist a large number of deep graph representation learning techniques have been proposed in the past decade, especially graph neural networks. In this survey, we conduct a comprehensive survey on current deep graph representation learning algorithms by proposing a new taxonomy of existing state-of-the-art literature. Specifically, we systematically summarize the essential components of graph representation learning and categorize existing approaches by the ways of graph neural network architectures and the most recent advanced learning paradigms. Moreover, this survey also provides the practical and promising applications of deep graph representation learning. Last but not least, we state new perspectives and suggest challenging directions which deserve further investigations in the future.
Collapse
Affiliation(s)
- Wei Ju
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zheng Fang
- School of Intelligence Science and Technology, Peking University, Beijing, 100871, China
| | - Yiyang Gu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zequn Liu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100086, China
| | - Ziyue Qiao
- Artificial Intelligence Thrust, The Hong Kong University of Science and Technology, Guangzhou, 511453, China
| | - Yifang Qin
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jianhao Shen
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Fang Sun
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Zhiping Xiao
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Junwei Yang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jingyang Yuan
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yusheng Zhao
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yifan Wang
- School of Information Technology & Management, University of International Business and Economics, Beijing, 100029, China
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, 90095, USA.
| | - Ming Zhang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China.
| |
Collapse
|
29
|
Shen X, Zeng T, Chen N, Li J, Wu R. NIMO: A Natural Product-Inspired Molecular Generative Model Based on Conditional Transformer. Molecules 2024; 29:1867. [PMID: 38675687 PMCID: PMC11053988 DOI: 10.3390/molecules29081867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 04/11/2024] [Accepted: 04/13/2024] [Indexed: 04/28/2024] Open
Abstract
Natural products (NPs) have diverse biological activity and significant medicinal value. The structural diversity of NPs is the mainstay of drug discovery. Expanding the chemical space of NPs is an urgent need. Inspired by the concept of fragment-assembled pseudo-natural products, we developed a computational tool called NIMO, which is based on the transformer neural network model. NIMO employs two tailor-made motif extraction methods to map a molecular graph into a semantic motif sequence. All these generated motif sequences are used to train our molecular generative models. Various NIMO models were trained under different task scenarios by recognizing syntactic patterns and structure-property relationships. We further explored the performance of NIMO in structure-guided, activity-oriented, and pocket-based molecule generation tasks. Our results show that NIMO had excellent performance for molecule generation from scratch and structure optimization from a scaffold.
Collapse
Affiliation(s)
- Xiaojuan Shen
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| | - Tao Zeng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| | - Nianhang Chen
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| | - Jiabo Li
- ChemXAI Inc., 53 Barry Lane, Syosset, NY 11791, USA
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| |
Collapse
|
30
|
Liu D, Song T, Na K, Wang S. PED: a novel predictor-encoder-decoder model for Alzheimer drug molecular generation. Front Artif Intell 2024; 7:1374148. [PMID: 38690194 PMCID: PMC11058643 DOI: 10.3389/frai.2024.1374148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 04/01/2024] [Indexed: 05/02/2024] Open
Abstract
Alzheimer's disease (AD) is a gradually advancing neurodegenerative disorder characterized by a concealed onset. Acetylcholinesterase (AChE) is an efficient hydrolase that catalyzes the hydrolysis of acetylcholine (ACh), which regulates the concentration of ACh at synapses and then terminates ACh-mediated neurotransmission. There are inhibitors to inhibit the activity of AChE currently, but its side effects are inevitable. In various application fields where Al have gained prominence, neural network-based models for molecular design have recently emerged and demonstrate encouraging outcomes. However, in the conditional molecular generation task, most of the current generation models need additional optimization algorithms to generate molecules with intended properties which make molecular generation inefficient. Consequently, we introduce a cognitive-conditional molecular design model, termed PED, which leverages the variational auto-encoder. Its primary function is to adeptly produce a molecular library tailored for specific properties. From this library, we can then identify molecules that inhibit AChE activity without adverse effects. These molecules serve as lead compounds, hastening AD treatment and concurrently enhancing the AI's cognitive abilities. In this study, we aim to fine-tune a VAE model pre-trained on the ZINC database using active compounds of AChE collected from Binding DB. Different from other molecular generation models, the PED can simultaneously perform both property prediction and molecule generation, consequently, it can generate molecules with intended properties without additional optimization process. Experiments of evaluation show that proposed model performs better than other methods benchmarked on the same data sets. The results indicated that the model learns a good representation of potential chemical space, it can well generate molecules with intended properties. Extensive experiments on benchmark datasets confirmed PED's efficiency and efficacy. Furthermore, we also verified the binding ability of molecules to AChE through molecular docking. The results showed that our molecular generation system for AD shows excellent cognitive capacities, the molecules within the molecular library could bind well to AChE and inhibit its activity, thus preventing the hydrolysis of ACh.
Collapse
Affiliation(s)
- Dayan Liu
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Kang Na
- The Ninth Department of Health Care Administration, The Second Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Shudong Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| |
Collapse
|
31
|
Mauri A, Bertola M. AlvaBuilder: A Software for De Novo Molecular Design. J Chem Inf Model 2024; 64:2136-2142. [PMID: 37399048 PMCID: PMC11005826 DOI: 10.1021/acs.jcim.3c00610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Indexed: 07/04/2023]
Abstract
AlvaBuilder is a software tool for de novo molecular design and can be used to generate novel molecules having desirable characteristics. Such characteristics can be defined using a simple step by step graphical interface, and they can be based on molecular descriptors, on predictions of QSAR/QSPR models, and on matching molecular fragments or used to design compounds similar to a given one. The molecules generated are always syntactically valid since they are composed by combining fragments of molecules taken from a training data set chosen by the user. In this paper, we demonstrate how the software can be used to design new compounds for a defined case study. AlvaBuilder is available at https://www.alvascience.com/alvabuilder/.
Collapse
Affiliation(s)
- Andrea Mauri
- Alvascience
Srl, Via Giuseppe Parini,
35, 23900 Lecco, Italy
| | - Matteo Bertola
- Alvascience
Srl, Via Giuseppe Parini,
35, 23900 Lecco, Italy
| |
Collapse
|
32
|
Vogt M. Chemoinformatic approaches for navigating large chemical spaces. Expert Opin Drug Discov 2024; 19:403-414. [PMID: 38300511 DOI: 10.1080/17460441.2024.2313475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/30/2024] [Indexed: 02/02/2024]
Abstract
INTRODUCTION Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation. AREAS COVERED An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed. EXPERT OPINION The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| |
Collapse
|
33
|
Ghiandoni GM, Evertsson E, Riley DJ, Tyrchan C, Rathi PC. Augmenting DMTA using predictive AI modelling at AstraZeneca. Drug Discov Today 2024; 29:103945. [PMID: 38460568 DOI: 10.1016/j.drudis.2024.103945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 03/11/2024]
Abstract
Design-Make-Test-Analyse (DMTA) is the discovery cycle through which molecules are designed, synthesised, and assayed to produce data that in turn are analysed to inform the next iteration. The process is repeated until viable drug candidates are identified, often requiring many cycles before reaching a sweet spot. The advent of artificial intelligence (AI) and cloud computing presents an opportunity to innovate drug discovery to reduce the number of cycles needed to yield a candidate. Here, we present the Predictive Insight Platform (PIP), a cloud-native modelling platform developed at AstraZeneca. The impact of PIP in each step of DMTA, as well as its architecture, integration, and usage, are discussed and used to provide insights into the future of drug discovery.
Collapse
Affiliation(s)
- Gian Marco Ghiandoni
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK.
| | - Emma Evertsson
- Research and Early Development, Respiratory and Immunology (R&I), Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden, Mölndal, SE 43183, Sweden
| | - David J Riley
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK
| | - Christian Tyrchan
- Research and Early Development, Respiratory and Immunology (R&I), Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden, Mölndal, SE 43183, Sweden
| | - Prakash Chandra Rathi
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK
| |
Collapse
|
34
|
Ghandikota SK, Jegga AG. Application of artificial intelligence and machine learning in drug repurposing. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 205:171-211. [PMID: 38789178 DOI: 10.1016/bs.pmbts.2024.03.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
The purpose of drug repurposing is to leverage previously approved drugs for a particular disease indication and apply them to another disease. It can be seen as a faster and more cost-effective approach to drug discovery and a powerful tool for achieving precision medicine. In addition, drug repurposing can be used to identify therapeutic candidates for rare diseases and phenotypic conditions with limited information on disease biology. Machine learning and artificial intelligence (AI) methodologies have enabled the construction of effective, data-driven repurposing pipelines by integrating and analyzing large-scale biomedical data. Recent technological advances, especially in heterogeneous network mining and natural language processing, have opened up exciting new opportunities and analytical strategies for drug repurposing. In this review, we first introduce the challenges in repurposing approaches and highlight some success stories, including those during the COVID-19 pandemic. Next, we review some existing computational frameworks in the literature, organized on the basis of the type of biomedical input data analyzed and the computational algorithms involved. In conclusion, we outline some exciting new directions that drug repurposing research may take, as pioneered by the generative AI revolution.
Collapse
Affiliation(s)
- Sudhir K Ghandikota
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Anil G Jegga
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States.
| |
Collapse
|
35
|
Zhang K, Tang Y, Yu H, Yang J, Tao L, Xiang P. Discovery of lupus nephritis targeted inhibitors based on De novo molecular design: comprehensive application of vinardo scoring, ADMET analysis, and molecular dynamics simulation. J Biomol Struct Dyn 2024:1-14. [PMID: 38501728 DOI: 10.1080/07391102.2024.2329293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Accepted: 03/06/2024] [Indexed: 03/20/2024]
Abstract
Lupus Nephritis (LN) is an autoimmune disease affecting the kidneys, and conventional drug studies have limitations due to its imprecise and complex pathogenesis. Therefore, the aim of this study was to design a novel Lupus Nephritis-targeted drug with good clinical due potential, high potency and selectivity by computer-assisted approach.NIK belongs to the serine/threonine protein kinase, which is gaining attention as a drug target for Lupus Nephritis. we used bioinformatics, homology modelling and sequence comparison analysis, small molecule ab initio design, ADMET analysis, molecular docking, molecular dynamics simulation, and MM/PBSA analysis to design and explore the selectivity and efficiency of a novel Lupus Nephritis-targeting drug, ClImYnib, and a classical NIK inhibitor, NIK SMI1. We used bioinformatics techniques to determine the correlation between lupus nephritis and the NF-κB signaling pathway. De novo drugs design was used to create a NIK-targeted inhibitor, ClImYnib, with lower toxicity, after which we used molecular dynamics to simulate NIK SMI1 against ClImYnib, and the simulation results showed that ClImYnib had better selectivity and efficiency. Our research delves into the molecular mechanism of protein ligands, and we have designed and validated an excellent NIK inhibitor using multiple computational simulation methods. More importantly, it provides an idea of target designing small molecules.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Kaiyuan Zhang
- School of Clinical Medicine, Bengbu Medical College, China
| | - Yingkai Tang
- Department of Anatomy, School of basic Medicine, Bengbu Medical College, China
| | - Haiyue Yu
- School of Clinical Medicine, Bengbu Medical College, China
| | - Jingtao Yang
- School of Clinical Medicine, Bengbu Medical College, China
| | - Lu Tao
- Central Laboratory, The Frist Affiliated Hospital of Bengbu Medical College, Bengbu, Anhui, China
| | - Ping Xiang
- Central Laboratory, The Frist Affiliated Hospital of Bengbu Medical College, Bengbu, Anhui, China
| |
Collapse
|
36
|
Chang J, Ye JC. Bidirectional generation of structure and properties through a single molecular foundation model. Nat Commun 2024; 15:2323. [PMID: 38485914 PMCID: PMC10940637 DOI: 10.1038/s41467-024-46440-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 02/27/2024] [Indexed: 03/18/2024] Open
Abstract
Recent successes of foundation models in artificial intelligence have prompted the emergence of large-scale chemical pre-trained models. Despite the growing interest in large molecular pre-trained models that provide informative representations for downstream tasks, attempts for multimodal pre-training approaches on the molecule domain were limited. To address this, here we present a multimodal molecular pre-trained model that incorporates the modalities of structure and biochemical properties, drawing inspiration from recent advances in multimodal learning techniques. Our proposed model pipeline of data handling and training objectives aligns the structure/property features in a common embedding space, which enables the model to regard bidirectional information between the molecules' structure and properties. These contributions emerge synergistic knowledge, allowing us to tackle both multimodal and unimodal downstream tasks through a single model. Through extensive experiments, we demonstrate that our model has the capabilities to solve various meaningful chemical challenges, including conditional molecule generation, property prediction, molecule classification, and reaction prediction.
Collapse
Affiliation(s)
- Jinho Chang
- Graduate School of AI, KAIST, Daejeon, South Korea
| | - Jong Chul Ye
- Graduate School of AI, KAIST, Daejeon, South Korea.
| |
Collapse
|
37
|
Moon SW, Min SK. Gaussian Process Regression-Based Near-Infrared d-Luciferin Analogue Design Using Mutation-Controlled Graph-Based Genetic Algorithm. J Chem Inf Model 2024; 64:1522-1532. [PMID: 38365605 DOI: 10.1021/acs.jcim.3c00870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2024]
Abstract
Molecular discovery is central to the field of chemical informatics. Although optimization approaches have been developed that target-specific molecular properties in combination with machine learning techniques, optimization using databases of limited size is challenging for efficient molecular design. We present a molecular design method with a Gaussian process regression model and a graph-based genetic algorithm (GB-GA) from a data set comprising a small number of compounds by introducing mutation probability control in the genetic algorithm to enhance the optimization capability and speed up the convergence to the optimal solution. In addition, we propose reducing the number of parameters in the conventional GB-GA focusing on efficient molecular design from a small database. We generated a target-specific database by combining active learning and iterative design in the evolutionary methodologies and chose Gaussian process regression as the prediction model for molecular properties. We show that the proposed scheme is more efficient for optimization toward the target properties from goal-directed benchmarks with several drug-like molecules compared to the conventional GB-GA method. Finally, we provide a demonstration whereby we designed D-luciferin analogues with near-infrared fluorescence for bioimaging, which is desirable for effective in vivo light sources, from a small-size data set.
Collapse
Affiliation(s)
- Sung Wook Moon
- Departmet of Chemistry, School of Natural Science, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulju-gun, Ulsan 44919, South Korea
| | - Seung Kyu Min
- Departmet of Chemistry, School of Natural Science, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulju-gun, Ulsan 44919, South Korea
| |
Collapse
|
38
|
Loeffler HH, He J, Tibo A, Janet JP, Voronov A, Mervin LH, Engkvist O. Reinvent 4: Modern AI-driven generative molecule design. J Cheminform 2024; 16:20. [PMID: 38383444 PMCID: PMC10882833 DOI: 10.1186/s13321-024-00812-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/09/2024] [Indexed: 02/23/2024] Open
Abstract
REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates de novo design, R-group replacement, library design, linker design, scaffold hopping and molecule optimization. This contribution gives an overview of the software and describes its design. Algorithms and their applications are discussed in detail. REINVENT 4 is a command line tool which reads a user configuration in either TOML or JSON format. The aim of this release is to provide reference implementations for some of the most common algorithms in AI based molecule generation. An additional goal with the release is to create a framework for education and future innovation in AI based molecular design. The software is available from https://github.com/MolecularAI/REINVENT4 and released under the permissive Apache 2.0 license. Scientific contribution. The software provides an open-source reference implementation for generative molecular design where the software is also being used in production to support in-house drug discovery projects. The publication of the most common machine learning algorithms in one code and full documentation thereof will increase transparency of AI and foster innovation, collaboration and education.
Collapse
Affiliation(s)
- Hannes H Loeffler
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alexey Voronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
39
|
Garg V. Generative AI for graph-based drug design: Recent advances and the way forward. Curr Opin Struct Biol 2024; 84:102769. [PMID: 38199072 DOI: 10.1016/j.sbi.2023.102769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/17/2023] [Accepted: 12/19/2023] [Indexed: 01/12/2024]
Abstract
Discovering new promising molecule candidates that could translate into effective drugs is a key scientific pursuit. However, factors such as the vastness and discreteness of the molecular search space pose a formidable technical challenge in this quest. AI-driven generative models can effectively learn from data, and offer hope to streamline drug design. In this article, we review state of the art in generative models that operate on molecular graphs. We also shed light on some limitations of the existing methodology and sketch directions to harness the potential of AI for drug design tasks going forward.
Collapse
Affiliation(s)
- Vikas Garg
- Aalto University and YaiYai Ltd, Finland.
| |
Collapse
|
40
|
Zheng L, Shi F, Peng C, Xu M, Fan F, Li Y, Zhang L, Du J, Wang Z, Lin Z, Sun Y, Deng C, Duan X, Wei L, Zhao C, Fang L, Zhang P, Ma S, Lai L, Yang M. Application scenario-oriented molecule generation platform developed for drug discovery. Methods 2024; 222:112-121. [PMID: 38215898 DOI: 10.1016/j.ymeth.2023.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 11/22/2023] [Accepted: 12/23/2023] [Indexed: 01/14/2024] Open
Abstract
Design of molecules for candidate compound selection is one of the central challenges in drug discovery due to the complexity of chemical space and requirement of multi-parameter optimization. Here we present an application scenario-oriented platform (ID4Idea) for molecule generation in different scenarios of drug discovery. This platform utilizes both library or rule based and generative based algorithms (VAE, RNN, GAN, etc.), in combination with various AI learning types (pre-training, transfer learning, reinforcement learning, active learning, etc.) and input representations (1D SMILES, 2D graph, 3D shape, binding site, pharmacophore, etc.), to enable customized solutions for a given molecular design scenario. Besides the usual generation followed screening protocol, goal-directed molecule generation can also be conducted towards predefined goals, enhancing the efficiency of hit identification, lead finding, and lead optimization. We demonstrate the effectiveness of ID4Idea platform through case studies, showcasing customized solutions for different design tasks using various input information, such as binding pockets, pharmacophores, and compound representations. In addition, remaining challenges are discussed to unlock the full potential of AI models in drug discovery and pave the way for the development of novel therapeutics.
Collapse
Affiliation(s)
- Lianjun Zheng
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Fangjun Shi
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Chunwang Peng
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Min Xu
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Fangda Fan
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Yuanpeng Li
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Lin Zhang
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Jiewen Du
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Zonghu Wang
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Zhixiong Lin
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Yina Sun
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Chenglong Deng
- Jingtai Zhiyao Technology (Shanghai) Co., Ltd. (XtalPi), No. 207 Huanqiao Road, Pudong New Area, Shanghai 201315, China
| | - Xinli Duan
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Lin Wei
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | | | - Lei Fang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Peiyu Zhang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Songling Ma
- XtalPi Innovation Center, XtalPi Inc., Beijing, China.
| | - Lipeng Lai
- XtalPi Innovation Center, XtalPi Inc., Beijing, China.
| | - Mingjun Yang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China.
| |
Collapse
|
41
|
Rusinko A, Rezaei M, Friedrich L, Buchstaller HP, Kuhn D, Ghogare A. AIDDISON: Empowering Drug Discovery with AI/ML and CADD Tools in a Secure, Web-Based SaaS Platform. J Chem Inf Model 2024; 64:3-8. [PMID: 38134123 PMCID: PMC10777390 DOI: 10.1021/acs.jcim.3c01016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 12/01/2023] [Accepted: 12/01/2023] [Indexed: 12/24/2023]
Abstract
The widespread proliferation of artificial intelligence (AI) and machine learning (ML) methods has a profound effect on the drug discovery process. However, many scientists are reluctant to utilize these powerful tools due to the steep learning curve typically associated with them. AIDDISON offers a convenient, secure, web-based platform for drug discovery, addressing the reluctance of scientists to adopt AI and ML methods due to the steep learning curve. By seamlessly integrating generative models, ADMET property predictions, searches in vast chemical spaces, and molecular docking, AIDDISON provides a sophisticated platform for modern drug discovery. It enables less computer-savvy scientists to utilize these powerful tools in their daily activities, as demonstrated by an example of identifying a valuable set of molecules for lead optimization. With AIDDISON, the benefits of AI/ML in drug discovery are accessible to all.
Collapse
Affiliation(s)
- Andrew Rusinko
- MilliporeSigma, 400 Summit Drive, Burlington, Massachusetts 01803, United States
| | - Mohammad Rezaei
- MilliporeSigma, 400 Summit Drive, Burlington, Massachusetts 01803, United States
| | - Lukas Friedrich
- Merck
Healthcare KGaA, Medicinal Chemistry and Drug Design, Darmstadt 64293, Germany
| | | | - Daniel Kuhn
- Merck
Healthcare KGaA, Medicinal Chemistry and Drug Design, Darmstadt 64293, Germany
| | - Ashwini Ghogare
- MilliporeSigma, 400 Summit Drive, Burlington, Massachusetts 01803, United States
| |
Collapse
|
42
|
Habiballah S, Heath LS, Reisfeld B. A deep-learning approach for identifying prospective chemical hazards. Toxicology 2024; 501:153708. [PMID: 38104655 DOI: 10.1016/j.tox.2023.153708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/11/2023] [Accepted: 12/13/2023] [Indexed: 12/19/2023]
Abstract
With the aim of helping to set safe exposure limits for the general population, various techniques have been implemented to conduct risk assessments for chemicals and other environmental stressors; however, none of these tools facilitate the identification of completely new chemicals that are likely hazardous and elicit an adverse biological effect. Here, we detail a novel in silico, deep-learning framework that is designed to systematically generate structures for new chemical compounds that are predicted to be chemical hazards. To assess the utility of the framework, we applied the tool to four endpoints related to environmental toxicants and their impacts on human and animal health: (i) toxicity to honeybees, (ii) immunotoxicity, (iii) endocrine disruption via ER-α antagonism, and (iv) mutagenicity. In addition, we characterized the predicted potency of these compounds and examined their structural relationship to existing chemicals of concern. As part of the array of emerging new approach methodologies (NAMs), we anticipate that such a framework will be a significant asset to risk assessors and other environmental scientists when planning and forecasting. Though not in the scope of the present study, we expect that the methodology detailed here could also be useful in the de novo design of more environmentally-friendly industrial chemicals.
Collapse
Affiliation(s)
- Sohaib Habiballah
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO 80523-1370, USA
| | - Lenwood S Heath
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061-0106, USA
| | - Brad Reisfeld
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO 80523-1370, USA; Colorado School of Public Health, Colorado State University, Fort Collins, CO 80523-1612, USA.
| |
Collapse
|
43
|
Olmedo DA, Durant-Archibold AA, López-Pérez JL, Medina-Franco JL. Design and Diversity Analysis of Chemical Libraries in Drug Discovery. Comb Chem High Throughput Screen 2024; 27:502-515. [PMID: 37409545 DOI: 10.2174/1386207326666230705150110] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/30/2023] [Accepted: 05/30/2023] [Indexed: 07/07/2023]
Abstract
Chemical libraries and compound data sets are among the main inputs to start the drug discovery process at universities, research institutes, and the pharmaceutical industry. The approach used in the design of compound libraries, the chemical information they possess, and the representation of structures, play a fundamental role in the development of studies: chemoinformatics, food informatics, in silico pharmacokinetics, computational toxicology, bioinformatics, and molecular modeling to generate computational hits that will continue the optimization process of drug candidates. The prospects for growth in drug discovery and development processes in chemical, biotechnological, and pharmaceutical companies began a few years ago by integrating computational tools with artificial intelligence methodologies. It is anticipated that it will increase the number of drugs approved by regulatory agencies shortly.
Collapse
Affiliation(s)
- Dionisio A Olmedo
- Centro de Investigaciones Farmacognósticas de la Flora Panameña (CIFLORPAN), Facultad de Farmacia, Universidad de Panamá, Ciudad de Panamá, Apartado, 0824-00178, Panamá
- Sistema Nacional de Investigación (SNI), Secretaria Nacional de Ciencia, Tecnología e Innovación (SENACYT), Ciudad del Saber, Clayton, Panamá
| | - Armando A Durant-Archibold
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Apartado, 0843-01103, Panamá
- Departamento de Bioquímica, Facultad de Ciencias Naturales, Exactas y Tecnología, Universidad de Panamá, Ciudad de Panamá, Panamá
| | - José Luis López-Pérez
- CESIFAR, Departamento de Farmacología, Facultad de Medicina, Universidad de Panamá, Ciudad de Panamá, Panamá
- Departamento de Ciencias Farmacéuticas, Facultad de Farmacia, Universidad de Salamanca, Avda. Campo Charro s/n, 37071 Salamanca, España
| | - José Luis Medina-Franco
- DIFACQUIM Grupo de Investigación, Departamento de Farmacia, Escuela de Química, Universidad Nacional Autónoma de México, Ciudad de México, Apartado, 04510, México
| |
Collapse
|
44
|
Iwata H, Nakai T, Koyama T, Matsumoto S, Kojima R, Okuno Y. VGAE-MCTS: A New Molecular Generative Model Combining the Variational Graph Auto-Encoder and Monte Carlo Tree Search. J Chem Inf Model 2023; 63:7392-7400. [PMID: 37993764 PMCID: PMC10716893 DOI: 10.1021/acs.jcim.3c01220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 11/03/2023] [Accepted: 11/03/2023] [Indexed: 11/24/2023]
Abstract
Molecular generation is crucial for advancing drug discovery, materials science, and chemical exploration. It expedites the search for new drug candidates, facilitates tailored material creation, and enhances our understanding of molecular diversity. By employing artificial intelligence techniques such as molecular generative models based on molecular graphs, researchers have tackled the challenge of identifying efficient molecules with desired properties. Here, we propose a new molecular generative model combining a graph-based deep neural network and a reinforcement learning technique. We evaluated the validity, novelty, and optimized physicochemical properties of the generated molecules. Importantly, the model explored uncharted regions of chemical space, allowing for the efficient discovery and design of new molecules. This innovative approach has considerable potential to revolutionize drug discovery, materials science, and chemical research for accelerating scientific innovation. By leveraging advanced techniques and exploring previously unexplored chemical spaces, this study offers promising prospects for the efficient discovery and design of new molecules in the field of drug development.
Collapse
Affiliation(s)
- Hiroaki Iwata
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Taichi Nakai
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Takuto Koyama
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Shigeyuki Matsumoto
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Ryosuke Kojima
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Yasushi Okuno
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
- HPC-
and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, Kobe-shi, Hyogo 650-0047, Japan
| |
Collapse
|
45
|
Yoo P, Bhowmik D, Mehta K, Zhang P, Liu F, Lupo Pasini M, Irle S. Deep learning workflow for the inverse design of molecules with specific optoelectronic properties. Sci Rep 2023; 13:20031. [PMID: 37973879 PMCID: PMC10654498 DOI: 10.1038/s41598-023-45385-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 10/19/2023] [Indexed: 11/19/2023] Open
Abstract
The inverse design of novel molecules with a desirable optoelectronic property requires consideration of the vast chemical spaces associated with varying chemical composition and molecular size. First principles-based property predictions have become increasingly helpful for assisting the selection of promising candidate chemical species for subsequent experimental validation. However, a brute-force computational screening of the entire chemical space is decidedly impossible. To alleviate the computational burden and accelerate rational molecular design, we here present an iterative deep learning workflow that combines (i) the density-functional tight-binding method for dynamic generation of property training data, (ii) a graph convolutional neural network surrogate model for rapid and reliable predictions of chemical and physical properties, and (iii) a masked language model. As proof of principle, we employ our workflow in the iterative generation of novel molecules with a target energy gap between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO).
Collapse
Affiliation(s)
- Pilsun Yoo
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831, USA.
| | - Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831, USA
| | - Kshitij Mehta
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831, USA
| | - Pei Zhang
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831, USA
| | - Frank Liu
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831, USA
| | - Massimiliano Lupo Pasini
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831, USA
| | - Stephan Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831, USA.
| |
Collapse
|
46
|
Liu N, Jin H, Zhang L, Liu Z. Plug-in Models: A Promising Direction for Molecular Generation. HEALTH DATA SCIENCE 2023; 3:0092. [PMID: 38487202 PMCID: PMC10880158 DOI: 10.34133/hds.0092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 09/26/2023] [Indexed: 03/17/2024]
Affiliation(s)
- Ningfeng Liu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Hongwei Jin
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| |
Collapse
|
47
|
Stanley M, Segler M. Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Curr Opin Struct Biol 2023; 82:102658. [PMID: 37473637 DOI: 10.1016/j.sbi.2023.102658] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 07/22/2023]
Abstract
Computational techniques, including virtual screening, de novo design, and generative models, play an increasing role in expediting DMTA cycles for modern molecular discovery. However, computationally proposed molecules must be synthetically feasible for laboratory testing. In this perspective, we offer a succinct introduction to the subject, and showcase typical workflows to integrate synthesis planning, synthesizability scoring, and molecule generation. Finally, we address limitations and opportunities for future research.
Collapse
Affiliation(s)
- Megan Stanley
- Microsoft Research AI4Science, UK. https://twitter.com/@megjanestanley
| | | |
Collapse
|
48
|
Kerstjens A, De Winter H. A molecule perturbation software library and its application to study the effects of molecular design constraints. J Cheminform 2023; 15:89. [PMID: 37752561 PMCID: PMC10523775 DOI: 10.1186/s13321-023-00761-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/15/2023] [Indexed: 09/28/2023] Open
Abstract
Computational molecular design can yield chemically unreasonable compounds when performed carelessly. A popular strategy to mitigate this risk is mimicking reference chemistry. This is commonly achieved by restricting the way in which molecules are constructed or modified. While it is well established that such an approach helps in designing chemically appealing molecules, concerns about these restrictions impacting chemical space exploration negatively linger. In this work we present a software library for constrained graph-based molecule manipulation and showcase its functionality by developing a molecule generator. Said generator designs molecules mimicking reference chemical features of differing granularity. We find that restricting molecular construction lightly, beyond the usual positive effects on drug-likeness and synthesizability of designed molecules, provides guidance to optimization algorithms navigating chemical space. Nonetheless, restricting molecular construction excessively can indeed hinder effective chemical space exploration.
Collapse
Affiliation(s)
- Alan Kerstjens
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Antwerp, Universiteitslaan 1, 2610, Wilrijk, Belgium
| | - Hans De Winter
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Antwerp, Universiteitslaan 1, 2610, Wilrijk, Belgium.
| |
Collapse
|
49
|
Wei L, Fu N, Song Y, Wang Q, Hu J. Probabilistic generative transformer language models for generative design of molecules. J Cheminform 2023; 15:88. [PMID: 37749655 PMCID: PMC10518939 DOI: 10.1186/s13321-023-00759-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 09/10/2023] [Indexed: 09/27/2023] Open
Abstract
Self-supervised neural language models have recently found wide applications in the generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose the Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering with molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer.
Collapse
Affiliation(s)
- Lai Wei
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA
| | - Nihang Fu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA
| | - Yuqi Song
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA
| | - Qian Wang
- Department of Chemistry and Biochemistry, University of South Carolina, Columbia, SC, 29201, USA
| | - Jianjun Hu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA.
| |
Collapse
|
50
|
Chan WT, Garcillán-Barcia MP, Yeo CC, Espinosa M. Type II bacterial toxin-antitoxins: hypotheses, facts, and the newfound plethora of the PezAT system. FEMS Microbiol Rev 2023; 47:fuad052. [PMID: 37715317 PMCID: PMC10532202 DOI: 10.1093/femsre/fuad052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 08/24/2023] [Accepted: 09/07/2023] [Indexed: 09/17/2023] Open
Abstract
Toxin-antitoxin (TA) systems are entities found in the prokaryotic genomes, with eight reported types. Type II, the best characterized, is comprised of two genes organized as an operon. Whereas toxins impair growth, the cognate antitoxin neutralizes its activity. TAs appeared to be involved in plasmid maintenance, persistence, virulence, and defence against bacteriophages. Most Type II toxins target the bacterial translational machinery. They seem to be antecessors of Higher Eukaryotes and Prokaryotes Nucleotide-binding (HEPN) RNases, minimal nucleotidyltransferase domains, or CRISPR-Cas systems. A total of four TAs encoded by Streptococcus pneumoniae, RelBE, YefMYoeB, Phd-Doc, and HicAB, belong to HEPN-RNases. The fifth is represented by PezAT/Epsilon-Zeta. PezT/Zeta toxins phosphorylate the peptidoglycan precursors, thereby blocking cell wall synthesis. We explore the body of knowledge (facts) and hypotheses procured for Type II TAs and analyse the data accumulated on the PezAT family. Bioinformatics analyses showed that homologues of PezT/Zeta toxin are abundantly distributed among 14 bacterial phyla mostly in Proteobacteria (48%), Firmicutes (27%), and Actinobacteria (18%), showing the widespread distribution of this TA. The pezAT locus was found to be mainly chromosomally encoded whereas its homologue, the tripartite omega-epsilon-zeta locus, was found mostly on plasmids. We found several orphan pezT/zeta toxins, unaccompanied by a cognate antitoxin.
Collapse
Affiliation(s)
- Wai Ting Chan
- Centro de Investigaciones Biológicas Margarita Salas, Consejo Superior de Investigaciones Científicas, Ramiro de Maeztu, 9, 28040 Madrid, Spain
| | - Maria Pilar Garcillán-Barcia
- Instituto de Biomedicina y Biotecnología de Cantabria (IBBTEC), Universidad de Cantabria-Consejo Superior de Investigaciones Científicas, C/Albert Einstein 22, PCTCAN, 39011 Santander, Spain
| | - Chew Chieng Yeo
- Centre for Research in Infectious Diseases and Biotechnology (CeRIDB), Faculty of Medicine
, Universiti Sultan Zainal Abidin, Jalan Sultan Mahumd, 20400 Kuala Terengganu, Malaysia
| | - Manuel Espinosa
- Centro de Investigaciones Biológicas Margarita Salas, Consejo Superior de Investigaciones Científicas, Ramiro de Maeztu, 9, 28040 Madrid, Spain
| |
Collapse
|