1
|
Alavi SF, Chen Y, Hou YF, Ge F, Zheng P, Dral PO. ANI-1ccx-gelu Universal Interatomic Potential and Its Fine-Tuning: Toward Accurate and Efficient Anharmonic Vibrational Frequencies. J Phys Chem Lett 2025:483-493. [PMID: 39748511 DOI: 10.1021/acs.jpclett.4c03031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
Abstract
Calculating anharmonic vibrational modes of molecules for interpreting experimental spectra is one of the most interesting challenges of contemporary computational chemistry. However, the traditional QM methods are costly for this application. Machine learning techniques have emerged as a powerful tool for substituting the traditional QM methods. Universal interatomic potentials (UIPs) hold a particular promise to deliver accurate results at a fraction of the cost of the traditional QM methods, but the performance of UIPs for calculating anharmonic vibrational frequencies remains hitherto unknown. Here we show that despite a known excellent performance of the representative UIP ANI-1ccx for thermochemical properties, it fails for the anharmonic frequencies due to the original unfortunate choice of the activation function. Hence, we recommend evaluating new UIPs on anharmonic frequencies as an additional important quality test. To remedy the shortcomings of ANI-1ccx, we introduce its reformulation ANI-1ccx-gelu with the GELU activation function, which is capable of calculating IR anharmonic frequencies with reasonable accuracy (close to B3LYP/6-31G*). We also show that our new UIP can be fine-tuned to obtain very accurate anharmonic frequencies for some specific molecules but more effort is needed to improve the overall quality of UIP and its capability for fine-tuning. The new UIP will be included as part of our universal and updatable AI-enhanced QM methods (UAIQM) platform and is available together with usage and fine-tuning tutorials in open-source MLatom at https://github.com/dralgroup/mlatom. The calculations can also be performed via a web browser at https://XACScloud.com.
Collapse
Affiliation(s)
- Seyedeh Fatemeh Alavi
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yi-Fan Hou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Institute of Physics, Faculty of Physics, Astronomy, and Informatics, Nicolaus Copernicus University in Torun, ul. Grudziądzka 5, 87-100 Torun, Poland
| |
Collapse
|
2
|
Kneiding H, Balcells D. Augmenting genetic algorithms with machine learning for inverse molecular design. Chem Sci 2024:d4sc02934h. [PMID: 39296997 PMCID: PMC11404003 DOI: 10.1039/d4sc02934h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/09/2024] [Indexed: 09/21/2024] Open
Abstract
Evolutionary and machine learning methods have been successfully applied to the generation of molecules and materials exhibiting desired properties. The combination of these two paradigms in inverse design tasks can yield powerful methods that explore massive chemical spaces more efficiently, improving the quality of the generated compounds. However, such synergistic approaches are still an incipient area of research and appear underexplored in the literature. This perspective covers different ways of incorporating machine learning approaches into evolutionary learning frameworks, with the overall goal of increasing the optimization efficiency of genetic algorithms. In particular, machine learning surrogate models for faster fitness function evaluation, discriminator models to control population diversity on-the-fly, machine learning based crossover operations, and evolution in latent space are discussed. The further potential of these synergistic approaches in generative tasks is also assessed, outlining promising directions for future developments.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| |
Collapse
|
3
|
Gangwal A, Lavecchia A. Unleashing the power of generative AI in drug discovery. Drug Discov Today 2024; 29:103992. [PMID: 38663579 DOI: 10.1016/j.drudis.2024.103992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 03/22/2024] [Accepted: 04/18/2024] [Indexed: 05/04/2024]
Abstract
Artificial intelligence (AI) is revolutionizing drug discovery by enhancing precision, reducing timelines and costs, and enabling AI-driven computer-aided drug design. This review focuses on recent advancements in deep generative models (DGMs) for de novo drug design, exploring diverse algorithms and their profound impact. It critically analyses the challenges that are intricately interwoven into these technologies, proposing strategies to unlock their full potential. It features case studies of both successes and failures in advancing drugs to clinical trials with AI assistance. Last, it outlines a forward-looking plan for optimizing DGMs in de novo drug design, thereby fostering faster and more cost-effective drug development.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule 424001, Maharashtra, India
| | - Antonio Lavecchia
- "Drug Discovery" Laboratory, Department of Pharmacy, University of Naples Federico II, I-80131 Naples, Italy.
| |
Collapse
|
4
|
Jiang Y, Chen Z, Sui N, Zhu Z. Data-Driven Evolutionary Design of Multienzyme-like Nanozymes. J Am Chem Soc 2024; 146:7565-7574. [PMID: 38445842 DOI: 10.1021/jacs.3c13588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Multienzyme-like nanozymes are nanomaterials with multiple enzyme-like activities and are the focus of nanozyme research owing to their ability to facilitate cascaded reactions, leverage synergistic effects, and exhibit environmentally responsive selectivity. However, multienzyme-like nanozymes exhibit varying enzyme-like activities under different conditions, making them difficult to precisely regulate according to the design requirements. Moreover, individual enzyme-like activity in a multienzyme-like activity may accelerate, compete, or antagonize each other, rendering the overall activity a complex interplay of these factors rather than a simple sum of single enzyme-like activity. A theoretically guided strategy is highly desired to accelerate the design of multienzyme-like nanozymes. Herein, nanozyme information was collected from 4159 publications to build a nanozyme database covering element type, element ratio, chemical valence, shape, pH, etc. Based on the clustering correlation coefficients of the nanozyme information, the material features in distinct nanozyme classifications were reorganized to generate compositional factors for multienzyme-like nanozymes. Moreover, advanced methods were developed, including the quantum mechanics/molecular mechanics method for analyzing the surface adsorption and binding energies of substrates, transition states, and products in the reaction pathways, along with machine learning algorithms to identify the optimal reaction pathway, to aid the evolutionary design of multienzyme-like nanozymes. This approach culminated in creating CuMnCo7O12, a highly active multienzyme-like nanozyme. This process is named the genetic-like evolutionary design of nanozymes because it resembles biological genetic evolution in nature and offers a feasible protocol and theoretical foundation for constructing multienzyme-like nanozymes.
Collapse
Affiliation(s)
- Yujie Jiang
- College of Materials Science and Engineering, Qingdao University of Science and Technology, 53 Zhengzhou Road, Qingdao 266042, Shandong, China
| | - Zibei Chen
- College of Materials Science and Engineering, Qingdao University of Science and Technology, 53 Zhengzhou Road, Qingdao 266042, Shandong, China
| | - Ning Sui
- College of Materials Science and Engineering, Qingdao University of Science and Technology, 53 Zhengzhou Road, Qingdao 266042, Shandong, China
| | - Zhiling Zhu
- College of Materials Science and Engineering, Qingdao University of Science and Technology, 53 Zhengzhou Road, Qingdao 266042, Shandong, China
| |
Collapse
|
5
|
Moon SW, Min SK. Gaussian Process Regression-Based Near-Infrared d-Luciferin Analogue Design Using Mutation-Controlled Graph-Based Genetic Algorithm. J Chem Inf Model 2024; 64:1522-1532. [PMID: 38365605 DOI: 10.1021/acs.jcim.3c00870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2024]
Abstract
Molecular discovery is central to the field of chemical informatics. Although optimization approaches have been developed that target-specific molecular properties in combination with machine learning techniques, optimization using databases of limited size is challenging for efficient molecular design. We present a molecular design method with a Gaussian process regression model and a graph-based genetic algorithm (GB-GA) from a data set comprising a small number of compounds by introducing mutation probability control in the genetic algorithm to enhance the optimization capability and speed up the convergence to the optimal solution. In addition, we propose reducing the number of parameters in the conventional GB-GA focusing on efficient molecular design from a small database. We generated a target-specific database by combining active learning and iterative design in the evolutionary methodologies and chose Gaussian process regression as the prediction model for molecular properties. We show that the proposed scheme is more efficient for optimization toward the target properties from goal-directed benchmarks with several drug-like molecules compared to the conventional GB-GA method. Finally, we provide a demonstration whereby we designed D-luciferin analogues with near-infrared fluorescence for bioimaging, which is desirable for effective in vivo light sources, from a small-size data set.
Collapse
Affiliation(s)
- Sung Wook Moon
- Departmet of Chemistry, School of Natural Science, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulju-gun, Ulsan 44919, South Korea
| | - Seung Kyu Min
- Departmet of Chemistry, School of Natural Science, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulju-gun, Ulsan 44919, South Korea
| |
Collapse
|
6
|
Koscher BA, Canty RB, McDonald MA, Greenman KP, McGill CJ, Bilodeau CL, Jin W, Wu H, Vermeire FH, Jin B, Hart T, Kulesza T, Li SC, Jaakkola TS, Barzilay R, Gómez-Bombarelli R, Green WH, Jensen KF. Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back. Science 2023; 382:eadi1407. [PMID: 38127734 DOI: 10.1126/science.adi1407] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 11/09/2023] [Indexed: 12/23/2023]
Abstract
A closed-loop, autonomous molecular discovery platform driven by integrated machine learning tools was developed to accelerate the design of molecules with desired properties. We demonstrated two case studies on dye-like molecules, targeting absorption wavelength, lipophilicity, and photooxidative stability. In the first study, the platform experimentally realized 294 unreported molecules across three automatic iterations of molecular design-make-test-analyze cycles while exploring the structure-function space of four rarely reported scaffolds. In each iteration, the property prediction models that guided exploration learned the structure-property space of diverse scaffold derivatives, which were realized with multistep syntheses and a variety of reactions. The second study exploited property models trained on the explored chemical space and previously reported molecules to discover nine top-performing molecules within a lightly explored structure-property space.
Collapse
Affiliation(s)
- Brent A Koscher
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Richard B Canty
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Matthew A McDonald
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Kevin P Greenman
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Charles J McGill
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Camille L Bilodeau
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Wengong Jin
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Brooke Jin
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Travis Hart
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Timothy Kulesza
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shih-Cheng Li
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Tommi S Jaakkola
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Rafael Gómez-Bombarelli
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
7
|
Wei L, Fu N, Song Y, Wang Q, Hu J. Probabilistic generative transformer language models for generative design of molecules. J Cheminform 2023; 15:88. [PMID: 37749655 PMCID: PMC10518939 DOI: 10.1186/s13321-023-00759-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 09/10/2023] [Indexed: 09/27/2023] Open
Abstract
Self-supervised neural language models have recently found wide applications in the generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose the Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering with molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer.
Collapse
Affiliation(s)
- Lai Wei
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA
| | - Nihang Fu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA
| | - Yuqi Song
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA
| | - Qian Wang
- Department of Chemistry and Biochemistry, University of South Carolina, Columbia, SC, 29201, USA
| | - Jianjun Hu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA.
| |
Collapse
|
8
|
Greenstein BL, Elsey DC, Hutchison GR. Determining best practices for using genetic algorithms in molecular discovery. J Chem Phys 2023; 159:091501. [PMID: 37655763 DOI: 10.1063/5.0158053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 08/09/2023] [Indexed: 09/02/2023] Open
Abstract
Genetic algorithms (GAs) are a powerful tool to search large chemical spaces for inverse molecular design. However, GAs have multiple hyperparameters that have not been thoroughly investigated for chemical space searches. In this tutorial, we examine the general effects of a number of hyperparameters, such as population size, elitism rate, selection method, mutation rate, and convergence criteria, on key GA performance metrics. We show that using a self-termination method with a minimum Spearman's rank correlation coefficient of 0.8 between generations maintained for 50 consecutive generations along with a population size of 32, a 50% elitism rate, three-way tournament selection, and a 40% mutation rate provides the best balance of finding the overall champion, maintaining good coverage of elite targets, and improving relative speedup for general use in molecular design GAs.
Collapse
Affiliation(s)
- Brianna L Greenstein
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, USA
| | - Danielle C Elsey
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, USA
| | - Geoffrey R Hutchison
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, USA
| |
Collapse
|
9
|
Ucak UV, Ashyrmamatov I, Lee J. Reconstruction of lossless molecular representations from fingerprints. J Cheminform 2023; 15:26. [PMID: 36823647 PMCID: PMC9948316 DOI: 10.1186/s13321-023-00693-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 02/04/2023] [Indexed: 02/25/2023] Open
Abstract
The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks.
Collapse
Affiliation(s)
- Umit V. Ucak
- grid.31501.360000 0004 0470 5905Research Institute of Pharmaceutical Science, College of Pharmacy, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826 Republic of Korea
| | - Islambek Ashyrmamatov
- grid.412010.60000 0001 0707 9039Department of Chemistry, Kangwon National University, Chuncheon, 24341 Republic of Korea
| | - Juyong Lee
- Research Institute of Pharmaceutical Science, College of Pharmacy, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea. .,Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea.
| |
Collapse
|
10
|
Green JD, Fuemmeler EG, Hele TJH. Inverse molecular design from first principles: tailoring organic chromophore spectra for optoelectronic applications. J Chem Phys 2022; 156:180901. [DOI: 10.1063/5.0082311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The discovery of molecules with tailored optoelectronic properties such as specific frequency and intensity of absorption or emission is a major challenge in creating next-generation organic light-emitting diodes (OLEDs) and photovoltaics. This raises the question: how can we predict a potential chemical structure from these properties? Approaches that attempt to tackle this inverse design problem include virtual screening, active machine learning and genetic algorithms. However, these approaches rely on a molecular database or many electronic structure calculations, and significant computational savings could be achieved if there was prior knowledge of (i) whether the optoelectronic properties of a parent molecule could easily be improved and (ii) what morphing operations on a parent molecule could improve these properties. In this perspective we address both of these challenges from first principles. We firstly adapt the Thomas-Reiche-Kuhn sum rule to organic chromophores and show how this indicates how easily the absorption and emission of a molecule can be improved. We then show how by combining electronic structure theory and intensity borrowing perturbation theory we can predict whether or not the proposed morphing operations will achieve the desired spectral alteration, and thereby derive widely-applicable design rules. We go on to provide proof-of-concept illustrations of this approach to optimizing the visible absorption of acenes and the emission of radical OLEDs. We believe this approach can be integrated into genetic algorithms by biasing morphing operations in favour of those which are likely to be successful, leading to faster molecular discovery and greener chemistry.
Collapse
|
11
|
Zhao J, Song W, Tang Z, Chen X. Macromolecular Effects in Medicinal Chemistry ※. ACTA CHIMICA SINICA 2022. [DOI: 10.6023/a21120602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|