1
|
Wang J, Zhu F. Multi-objective molecular generation via clustered Pareto-based reinforcement learning. Neural Netw 2024; 179:106596. [PMID: 39163823 DOI: 10.1016/j.neunet.2024.106596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 06/16/2024] [Accepted: 08/01/2024] [Indexed: 08/22/2024]
Abstract
De novo molecular design is the process of learning knowledge from existing data to propose new chemical structures that satisfy the desired properties. By using de novo design to generate compounds in a directed manner, better solutions can be obtained in large chemical libraries with less comparison cost. But drug design needs to take multiple factors into consideration. For example, in polypharmacology, molecules that activate or inhibit multiple target proteins produce multiple pharmacological activities and are less susceptible to drug resistance. However, most existing molecular generation methods either focus only on affinity for a single target or fail to effectively balance the relationship between multiple targets, resulting in insufficient validity and desirability of the generated molecules. To address the problems, an approach called clustered Pareto-based reinforcement learning (CPRL) is proposed. In CPRL, a pre-trained model is constructed to grasp existing molecular knowledge in a supervised learning manner. In addition, the clustered Pareto optimization algorithm is presented to find the best solution between different objectives. The algorithm first extracts an update set from the sampled molecules through the designed aggregation-based molecular clustering. Then, the final reward is computed by constructing the Pareto frontier ranking of the molecules from the updated set. To explore the vast chemical space, a reinforcement learning agent is designed in CPRL that can be updated under the guidance of the final reward to balance multiple properties. Furthermore, to increase the internal diversity of the molecules, a fixed-parameter exploration model is used for sampling in conjunction with the agent. The experimental results demonstrate that CPRL is capable of balancing multiple properties of the molecule and has higher desirability and validity, reaching 0.9551 and 0.9923, respectively.
Collapse
Affiliation(s)
- Jing Wang
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| |
Collapse
|
2
|
Wang F, Cheng X, Xia X, Zheng C, Su Y. Adaptive Space Search-based Molecular Evolution Optimization Algorithm. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae446. [PMID: 39041594 DOI: 10.1093/bioinformatics/btae446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 03/30/2024] [Accepted: 07/22/2024] [Indexed: 07/24/2024]
Abstract
MOTIVATION In drug development process, a significant portion of budget and research time are dedicated to the lead compound optimization procedure in order to identify potential drugs. This procedure focuses on enhancing the pharmacological and bioactive properties of compounds by optimizing their local substructures. However, due to the vast and discrete chemical structure space and the unpredictable element combinations within this space, the optimization process is inherently complex. Various structure enumeration-based combinatorial optimization methods have shown certain advantages. However, they still have limitations. Those methods fail to consider the differences between molecules and struggle to explore the unknown outer search space. RESULTS In this study, we propose an adaptive space search-based molecular evolution optimization algorithm (ASSMOEA). It consists of three key modules: construction of molecule-specific search space, molecular evolutionary optimization, and adaptive expansion of molecule-specific search space. Specifically, we design a fragment similarity tree in molecule-specific search space, and apply a dynamic mutation strategy in this space to guide molecular optimization. Then we utilize an encoder-encoder structure to adaptively expand the space. Those three modules are circled iteratively to optimize molecules. Our experiments demonstrate that ASSMOEA outperforms existing methods in terms of molecular optimization. It not only enhances the efficiency of the molecular optimization process, but also exhibits a robust ability to search for correct solutions. AVAILABILITY AND IMPLEMENTATION The code is freely available on the web at https://github.com/bbbbb-b/MEOAFST. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fei Wang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, 230601, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
| | - Xianglong Cheng
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, 230601, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
| | - Xin Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, 230601, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
| | - Chunhou Zheng
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, 230601, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, 230601, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
| |
Collapse
|
3
|
Xia X, Liu Y, Zheng C, Zhang X, Wu Q, Gao X, Zeng X, Su Y. Evolutionary Multiobjective Molecule Optimization in an Implicit Chemical Space. J Chem Inf Model 2024; 64:5161-5174. [PMID: 38870455 PMCID: PMC11235097 DOI: 10.1021/acs.jcim.4c00031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/08/2024] [Accepted: 05/13/2024] [Indexed: 06/15/2024]
Abstract
Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.
Collapse
Affiliation(s)
- Xin Xia
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| | - Yiping Liu
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Chunhou Zheng
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xingyi Zhang
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Qingwen Wu
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xin Gao
- Computer
Science Program, Computer, Electrical and Mathematical Sciences and
Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology
(KAUST), Thuwal 23955-6900, Kingdom
of Saudi Arabia
| | - Xiangxiang Zeng
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Yansen Su
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| |
Collapse
|
4
|
López-Pérez K, Kim TD, Miranda-Quintana RA. iSIM: instant similarity. DIGITAL DISCOVERY 2024; 3:1160-1171. [PMID: 38873032 PMCID: PMC11167700 DOI: 10.1039/d4dd00041b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 05/06/2024] [Indexed: 06/15/2024]
Abstract
The quantification of molecular similarity has been present since the beginning of cheminformatics. Although several similarity indices and molecular representations have been reported, all of them ultimately reduce to the calculation of molecular similarities of only two objects at a time. Hence, to obtain the average similarity of a set of molecules, all the pairwise comparisons need to be computed, which demands a quadratic scaling in the number of computational resources. Here we propose an exact alternative to this problem: iSIM (instant similarity). iSIM performs comparisons of multiple molecules at the same time and yields the same value as the average pairwise comparisons of molecules represented by binary fingerprints and real-value descriptors. In this work, we introduce the mathematical framework and several applications of iSIM in chemical sampling, visualization, diversity selection, and clustering.
Collapse
Affiliation(s)
- Kenneth López-Pérez
- Department of Chemistry and Quantum Theory Project, University of Florida Gainesville Florida 32611 USA
| | - Taewon D Kim
- Department of Chemistry and Quantum Theory Project, University of Florida Gainesville Florida 32611 USA
| | | |
Collapse
|
5
|
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model 2024; 64:2174-2194. [PMID: 37934070 DOI: 10.1021/acs.jcim.3c01496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.
Collapse
Affiliation(s)
- Chao Pang
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| |
Collapse
|
6
|
Kneiding H, Nova A, Balcells D. Directional multiobjective optimization of metal complexes at the billion-system scale. NATURE COMPUTATIONAL SCIENCE 2024; 4:263-273. [PMID: 38553635 DOI: 10.1038/s43588-024-00616-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 02/29/2024] [Indexed: 04/14/2024]
Abstract
The discovery of transition metal complexes (TMCs) with optimal properties requires large ligand libraries and efficient multiobjective optimization algorithms. Here we provide the tmQMg-L library, containing 30k diverse and synthesizable ligands with robustly assigned charges and metal coordination modes. tmQMg-L enabled the generation of 1.37 million palladium TMCs, which were used to develop and benchmark the Pareto-Lighthouse multiobjective genetic algorithm (PL-MOGA). With fine control over aim and scope, this algorithm maximized both the polarizability and highest occupied molecular orbital-lowest unoccupied molecular orbital gap of the TMCs within selected regions of the Pareto front, without requiring prior knowledge on the objective limits. Instead of genetic operations on small ligand fragments, the PL-MOGA did whole-ligand mutation and crossover operations, which in chemical spaces containing billions of systems, yielded thousands of highly diverse TMCs in an interpretable manner.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway
| | - Ainara Nova
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway
- Centre for Materials Science and Nanotechnology, Department of Chemistry, University of Oslo, Oslo, Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway.
| |
Collapse
|
7
|
García-Sosa AT. Benford's Law and distributions for better drug design. Expert Opin Drug Discov 2024; 19:131-137. [PMID: 37921672 DOI: 10.1080/17460441.2023.2277342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 10/26/2023] [Indexed: 11/04/2023]
Abstract
INTRODUCTION Modern drug discovery incorporates various tools and data, heralding the beginning of the data-driven drug design (DD) era. The distributions of chemical and physical data used for Artificial Intelligence (AI)/Machine Learning (ML) and to drive DD have thus become highly important to be understood and used effectively. AREAS COVERED The authors perform a comprehensive exploration of the statistical distributions driving the data-intensive era of drug discovery, including Benford's Law in AI/ML-based DD. EXPERT OPINION As the relevance of data-driven discovery escalates, we anticipate meticulous scrutiny of datasets utilizing principles like Benford's Law to enhance data integrity and guide efficient resource allocation and experimental planning. In this data-driven era of the pharmaceutical and medical industries, addressing critical aspects such as bias mitigation, algorithm effectiveness, data stewardship, effects, and fraud prevention are essential. Harnessing Benford's Law and other distributions and statistical tests in DD provides a potent strategy to detect data anomalies, fill data gaps, and enhance dataset quality. Benford's Law is a fast method for data integrity and quality of datasets, the backbone of AI/ML and other modeling approaches, proving very useful in the design process.
Collapse
Affiliation(s)
- Alfonso T García-Sosa
- Chair of Molecular Technology, Institute of Chemistry, University of Tartu, Tartu, Estonia
| |
Collapse
|
8
|
Qian Y, Shi M, Zhang Q. CONSMI: Contrastive Learning in the Simplified Molecular Input Line Entry System Helps Generate Better Molecules. Molecules 2024; 29:495. [PMID: 38276573 PMCID: PMC10821140 DOI: 10.3390/molecules29020495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/27/2024] Open
Abstract
In recent years, the application of deep learning in molecular de novo design has gained significant attention. One successful approach involves using SMILES representations of molecules and treating the generation task as a text generation problem, yielding promising results. However, the generation of more effective and novel molecules remains a key research area. Due to the fact that a molecule can have multiple SMILES representations, it is not sufficient to consider only one of them for molecular generation. To make up for this deficiency, and also motivated by the advancements in contrastive learning in natural language processing, we propose a contrastive learning framework called CONSMI to learn more comprehensive SMILES representations. This framework leverages different SMILES representations of the same molecule as positive examples and other SMILES representations as negative examples for contrastive learning. The experimental results of generation tasks demonstrate that CONSMI significantly enhances the novelty of generated molecules while maintaining a high validity. Moreover, the generated molecules have similar chemical properties compared to the original dataset. Additionally, we find that CONSMI can achieve favorable results in classifier tasks, such as the compound-protein interaction task.
Collapse
Affiliation(s)
| | | | - Qian Zhang
- School of Computer Science and Technology, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, East China Normal University, 3663 North Zhongshan Road, Putuo District, Shanghai 200062, China; (Y.Q.); (M.S.)
| |
Collapse
|
9
|
Angelo JS, Guedes IA, Barbosa HJC, Dardenne LE. Multi-and many-objective optimization: present and future in de novo drug design. Front Chem 2023; 11:1288626. [PMID: 38192501 PMCID: PMC10773868 DOI: 10.3389/fchem.2023.1288626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 11/27/2023] [Indexed: 01/10/2024] Open
Abstract
de novo Drug Design (dnDD) aims to create new molecules that satisfy multiple conflicting objectives. Since several desired properties can be considered in the optimization process, dnDD is naturally categorized as a many-objective optimization problem (ManyOOP), where more than three objectives must be simultaneously optimized. However, a large number of objectives typically pose several challenges that affect the choice and the design of optimization methodologies. Herein, we cover the application of multi- and many-objective optimization methods, particularly those based on Evolutionary Computation and Machine Learning techniques, to enlighten their potential application in dnDD. Additionally, we comprehensively analyze how molecular properties used in the optimization process are applied as either objectives or constraints to the problem. Finally, we discuss future research in many-objective optimization for dnDD, highlighting two important possible impacts: i) its integration with the development of multi-target approaches to accelerate the discovery of innovative and more efficacious drug therapies and ii) its role as a catalyst for new developments in more fundamental and general methodological frameworks in the field.
Collapse
Affiliation(s)
| | | | | | - Laurent E. Dardenne
- Coordenação de Modelagem Computacional, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| |
Collapse
|
10
|
Greenstein BL, Elsey DC, Hutchison GR. Determining best practices for using genetic algorithms in molecular discovery. J Chem Phys 2023; 159:091501. [PMID: 37655763 DOI: 10.1063/5.0158053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 08/09/2023] [Indexed: 09/02/2023] Open
Abstract
Genetic algorithms (GAs) are a powerful tool to search large chemical spaces for inverse molecular design. However, GAs have multiple hyperparameters that have not been thoroughly investigated for chemical space searches. In this tutorial, we examine the general effects of a number of hyperparameters, such as population size, elitism rate, selection method, mutation rate, and convergence criteria, on key GA performance metrics. We show that using a self-termination method with a minimum Spearman's rank correlation coefficient of 0.8 between generations maintained for 50 consecutive generations along with a population size of 32, a 50% elitism rate, three-way tournament selection, and a 40% mutation rate provides the best balance of finding the overall champion, maintaining good coverage of elite targets, and improving relative speedup for general use in molecular design GAs.
Collapse
Affiliation(s)
- Brianna L Greenstein
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, USA
| | - Danielle C Elsey
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, USA
| | - Geoffrey R Hutchison
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, USA
| |
Collapse
|
11
|
Luukkonen S, van den Maagdenberg HW, Emmerich MTM, van Westen GJP. Artificial intelligence in multi-objective drug design. Curr Opin Struct Biol 2023; 79:102537. [PMID: 36774727 DOI: 10.1016/j.sbi.2023.102537] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 12/21/2022] [Accepted: 01/03/2023] [Indexed: 02/12/2023]
Abstract
The factors determining a drug's success are manifold, making de novo drug design an inherently multi-objective optimisation (MOO) problem. With the advent of machine learning and optimisation methods, the field of multi-objective compound design has seen a rapid increase in developments and applications. Population-based metaheuris-tics and deep reinforcement learning are the most commonly used artificial intelligence methods in the field, but recently conditional learning methods are gaining popularity. The former approaches are coupled with a MOO strat-egy which is most commonly an aggregation function, but Pareto-based strategies are widespread too. Besides these and conditional learning, various innovative approaches to tackle MOO in drug design have been proposed. Here we provide a brief overview of the field and the latest innovations.
Collapse
Affiliation(s)
- Sohvi Luukkonen
- Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, the Netherlands. https://twitter.com/sohvi_luukkonen
| | - Helle W van den Maagdenberg
- Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, the Netherlands
| | - Michael T M Emmerich
- Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, Leiden, 2333 CC, the Netherlands
| | - Gerard J P van Westen
- Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden, 2333 CC, the Netherlands.
| |
Collapse
|
12
|
Fromer JC, Coley CW. Computer-aided multi-objective optimization in small molecule discovery. PATTERNS (NEW YORK, N.Y.) 2023; 4:100678. [PMID: 36873904 PMCID: PMC9982302 DOI: 10.1016/j.patter.2023.100678] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Molecular discovery is a multi-objective optimization problem that requires identifying a molecule or set of molecules that balance multiple, often competing, properties. Multi-objective molecular design is commonly addressed by combining properties of interest into a single objective function using scalarization, which imposes assumptions about relative importance and uncovers little about the trade-offs between objectives. In contrast to scalarization, Pareto optimization does not require knowledge of relative importance and reveals the trade-offs between objectives. However, it introduces additional considerations in algorithm design. In this review, we describe pool-based and de novo generative approaches to multi-objective molecular discovery with a focus on Pareto optimization algorithms. We show how pool-based molecular discovery is a relatively direct extension of multi-objective Bayesian optimization and how the plethora of different generative models extend from single-objective to multi-objective optimization in similar ways using non-dominated sorting in the reward function (reinforcement learning) or to select molecules for retraining (distribution learning) or propagation (genetic algorithms). Finally, we discuss some remaining challenges and opportunities in the field, emphasizing the opportunity to adopt Bayesian optimization techniques into multi-objective de novo design.
Collapse
Affiliation(s)
- Jenna C Fromer
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA.,Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139, USA
| |
Collapse
|