1
|
Greenstein BL, Elsey DC, Hutchison GR. Determining best practices for using genetic algorithms in molecular discovery. J Chem Phys 2023; 159:091501. [PMID: 37655763 DOI: 10.1063/5.0158053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 08/09/2023] [Indexed: 09/02/2023] Open
Abstract
Genetic algorithms (GAs) are a powerful tool to search large chemical spaces for inverse molecular design. However, GAs have multiple hyperparameters that have not been thoroughly investigated for chemical space searches. In this tutorial, we examine the general effects of a number of hyperparameters, such as population size, elitism rate, selection method, mutation rate, and convergence criteria, on key GA performance metrics. We show that using a self-termination method with a minimum Spearman's rank correlation coefficient of 0.8 between generations maintained for 50 consecutive generations along with a population size of 32, a 50% elitism rate, three-way tournament selection, and a 40% mutation rate provides the best balance of finding the overall champion, maintaining good coverage of elite targets, and improving relative speedup for general use in molecular design GAs.
Collapse
Affiliation(s)
- Brianna L Greenstein
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, USA
| | - Danielle C Elsey
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, USA
| | - Geoffrey R Hutchison
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, USA
| |
Collapse
|
2
|
Jiang Y, Salley D, Sharma A, Keenan G, Mullin M, Cronin L. An artificial intelligence enabled chemical synthesis robot for exploration and optimization of nanomaterials. SCIENCE ADVANCES 2022; 8:eabo2626. [PMID: 36206340 PMCID: PMC9544322 DOI: 10.1126/sciadv.abo2626] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Accepted: 08/23/2022] [Indexed: 05/19/2023]
Abstract
We present an autonomous chemical synthesis robot for the exploration, discovery, and optimization of nanostructures driven by real-time spectroscopic feedback, theory, and machine learning algorithms that control the reaction conditions and allow the selective templating of reactions. This approach allows the transfer of materials as seeds between cycles of exploration, opening the search space like gene transfer in biology. The open-ended exploration of the seed-mediated multistep synthesis of gold nanoparticles (AuNPs) via in-line ultraviolet-visible characterization led to the discovery of five categories of nanoparticles by only performing ca. 1000 experiments in three hierarchically linked chemical spaces. The platform optimized nanostructures with desired optical properties by combining experiments and extinction spectrum simulations to achieve a yield of up to 95%. The synthetic procedure is outputted in a universal format using the chemical description language (χDL) with analytical data to produce a unique digital signature to enable the reproducibility of the synthesis.
Collapse
Affiliation(s)
- Yibin Jiang
- School of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
| | - Daniel Salley
- School of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
| | - Abhishek Sharma
- School of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
| | - Graham Keenan
- School of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
| | - Margaret Mullin
- Glasgow Imaging Facility, Institute of Infection Immunity and Inflammation, College of Medical Veterinary and Life Sciences, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
| | - Leroy Cronin
- School of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
- Corresponding author.
| |
Collapse
|
3
|
Verhellen J. Graph-based molecular Pareto optimisation. Chem Sci 2022; 13:7526-7535. [PMID: 35872811 PMCID: PMC9241971 DOI: 10.1039/d2sc00821a] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 06/02/2022] [Indexed: 12/02/2022] Open
Abstract
Computer-assisted design of small molecules has experienced a resurgence in academic and industrial interest due to the widespread use of data-driven techniques such as deep generative models. While the ability to generate molecules that fulfil required chemical properties is encouraging, the use of deep learning models requires significant, if not prohibitive, amounts of data and computational power. At the same time, open-sourcing of more traditional techniques such as graph-based genetic algorithms for molecular optimisation [Jensen, Chem. Sci., 2019, 12, 3567-3572] has shown that simple and training-free algorithms can be efficient and robust alternatives. Further research alleviated the common genetic algorithm issue of evolutionary stagnation by enforcing molecular diversity during optimisation [Van den Abeele, Chem. Sci., 2020, 42, 11485-11491]. The crucial lesson distilled from the simultaneous development of deep generative models and advanced genetic algorithms has been the importance of chemical space exploration [Aspuru-Guzik, Chem. Sci., 2021, 12, 7079-7090]. For single-objective optimisation problems, chemical space exploration had to be discovered as a useable resource but in multi-objective optimisation problems, an exploration of trade-offs between conflicting objectives is inherently present. In this paper we provide state-of-the-art and open-source implementations of two generations of graph-based non-dominated sorting genetic algorithms (NSGA-II, NSGA-III) for molecular multi-objective optimisation. We provide the results of a series of benchmarks for the inverse design of small molecule drugs for both the NSGA-II and NSGA-III algorithms. In addition, we introduce the dominated hypervolume and extended fingerprint based internal similarity as novel metrics for these benchmarks. By design, NSGA-II, and NSGA-III outperform a single optimisation method baseline in terms of dominated hypervolume, but remarkably our results show they do so without relying on a greater internal chemical diversity.
Collapse
Affiliation(s)
- Jonas Verhellen
- Centre for Integrative Neuroplasticity, University of Oslo N-0316 Oslo Norway
| |
Collapse
|
4
|
Hiener DC, Hutchison GR. Pareto Optimization of Oligomer Polarizability and Dipole Moment Using a Genetic Algorithm. J Phys Chem A 2022; 126:2750-2760. [PMID: 35471827 DOI: 10.1021/acs.jpca.2c01266] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
High-performance electronic components are highly sought after in order to produce increasingly smaller and cheaper electronic devices. Drawing inspiration from inorganic dielectric materials, in which both polarizability and polarization contribute, organic materials can also maximize both. For a large set of small molecules drawn from PubChem, a Pareto-like front appears between the polarizability and dipole moment, indicating the presence of an apparent trade-off between these two properties. We tested this balance in π-conjugated materials by searching for novel conjugated hexamers with simultaneously large polarizabilities and dipole moments with potential use for dielectric materials. Using a genetic algorithm (GA) screening technique in conjunction with an approximate density functional tight-binding method for property calculations, we were able to efficiently search chemical space for optimal hexamers. Given the scope of chemical space, using the GA technique saves considerable time and resources by speeding up molecular searches compared to a systematic search. We also explored the underlying structure-function relationships, including sequence and monomer properties, that characterize large polarizability and dipole moment regimes.
Collapse
Affiliation(s)
- Danielle C Hiener
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
| | - Geoffrey R Hutchison
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States.,Department of Chemical and Petroleum Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
5
|
Gupta A, Chakraborty S, Ghosh D, Ramakrishnan R. Data-driven modeling of S 0 → S 1 excitation energy in the BODIPY chemical space: High-throughput computation, quantum machine learning, and inverse design. J Chem Phys 2021; 155:244102. [PMID: 34972385 DOI: 10.1063/5.0076787] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Derivatives of BODIPY are popular fluorophores due to their synthetic feasibility, structural rigidity, high quantum yield, and tunable spectroscopic properties. While the characteristic absorption maximum of BODIPY is at 2.5 eV, combinations of functional groups and substitution sites can shift the peak position by ±1 eV. Time-dependent long-range corrected hybrid density functional methods can model the lowest excitation energies offering a semi-quantitative precision of ±0.3 eV. Alas, the chemical space of BODIPYs stemming from combinatorial introduction of-even a few dozen-substituents is too large for brute-force high-throughput modeling. To navigate this vast space, we select 77 412 molecules and train a kernel-based quantum machine learning model providing <2% hold-out error. Further reuse of the results presented here to navigate the entire BODIPY universe comprising over 253 giga (253 × 109) molecules is demonstrated by inverse-designing candidates with desired target excitation energies.
Collapse
Affiliation(s)
- Amit Gupta
- Centre for Interdisciplinary Sciences, Tata Institute of Fundamental Research, Hyderabad 500107, India
| | - Sabyasachi Chakraborty
- Centre for Interdisciplinary Sciences, Tata Institute of Fundamental Research, Hyderabad 500107, India
| | - Debashree Ghosh
- Indian Association for the Cultivation of Science, Kolkata 700032, India
| | - Raghunathan Ramakrishnan
- Centre for Interdisciplinary Sciences, Tata Institute of Fundamental Research, Hyderabad 500107, India
| |
Collapse
|
6
|
Cazenille L, Baccouche A, Aubert-Kato N. Automated exploration of DNA-based structure self-assembly networks. ROYAL SOCIETY OPEN SCIENCE 2021; 8:210848. [PMID: 34754499 PMCID: PMC8493194 DOI: 10.1098/rsos.210848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 09/15/2021] [Indexed: 06/13/2023]
Abstract
Finding DNA sequences capable of folding into specific nanostructures is a hard problem, as it involves very large search spaces and complex nonlinear dynamics. Typical methods to solve it aim to reduce the search space by minimizing unwanted interactions through restrictions on the design (e.g. staples in DNA origami or voxel-based designs in DNA Bricks). Here, we present a novel methodology that aims to reduce this search space by identifying the relevant properties of a given assembly system to the emergence of various families of structures (e.g. simple structures, polymers, branched structures). For a given set of DNA strands, our approach automatically finds chemical reaction networks (CRNs) that generate sets of structures exhibiting ranges of specific user-specified properties, such as length and type of structures or their frequency of occurrence. For each set, we enumerate the possible DNA structures that can be generated through domain-level interactions, identify the most prevalent structures, find the best-performing sequence sets to the emergence of target structures, and assess CRNs' robustness to the removal of reaction pathways. Our results suggest a connection between the characteristics of DNA strands and the distribution of generated structure families.
Collapse
Affiliation(s)
- L. Cazenille
- Department of Information Sciences, Ochanomizu University, Tokyo, Japan
| | | | - N. Aubert-Kato
- Department of Information Sciences, Ochanomizu University, Tokyo, Japan
| |
Collapse
|
7
|
Meyers J, Fabian B, Brown N. De novo molecular design and generative models. Drug Discov Today 2021; 26:2707-2715. [PMID: 34082136 DOI: 10.1016/j.drudis.2021.05.019] [Citation(s) in RCA: 77] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 04/21/2021] [Accepted: 05/26/2021] [Indexed: 02/09/2023]
Abstract
Molecular design strategies are integral to therapeutic progress in drug discovery. Computational approaches for de novo molecular design have been developed over the past three decades and, recently, thanks in part to advances in machine learning (ML) and artificial intelligence (AI), the drug discovery field has gained practical experience. Here, we review these learnings and present de novo approaches according to the coarseness of their molecular representation: that is, whether molecular design is modeled on an atom-based, fragment-based, or reaction-based paradigm. Furthermore, we emphasize the value of strong benchmarks, describe the main challenges to using these methods in practice, and provide a viewpoint on further opportunities for exploration and challenges to be tackled in the upcoming years.
Collapse
Affiliation(s)
| | | | - Nathan Brown
- BenevolentAI, 4-8 Maple Street, London W1T 5HD, UK
| |
Collapse
|
8
|
Nigam A, Pollice R, Krenn M, Gomes GDP, Aspuru-Guzik A. Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES. Chem Sci 2021; 12:7079-7090. [PMID: 34123336 PMCID: PMC8153210 DOI: 10.1039/d1sc00231g] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 04/12/2021] [Indexed: 11/23/2022] Open
Abstract
Inverse design allows the generation of molecules with desirable physical quantities using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED - a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. First, we achieve non-trivial performance on typical benchmarks for generative models without any training. Additionally, we demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. Overall, we anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wider adoption.
Collapse
Affiliation(s)
- AkshatKumar Nigam
- Department of Computer Science, University of Toronto Canada
- Department of Chemistry, University of Toronto Canada
| | - Robert Pollice
- Department of Computer Science, University of Toronto Canada
- Department of Chemistry, University of Toronto Canada
| | - Mario Krenn
- Department of Computer Science, University of Toronto Canada
- Department of Chemistry, University of Toronto Canada
- Vector Institute for Artificial Intelligence Toronto Canada
| | - Gabriel Dos Passos Gomes
- Department of Computer Science, University of Toronto Canada
- Department of Chemistry, University of Toronto Canada
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto Canada
- Department of Chemistry, University of Toronto Canada
- Vector Institute for Artificial Intelligence Toronto Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR) 661 University Ave Toronto Ontario M5G Canada
| |
Collapse
|