1
|
Dobbelaere MR, Lengyel I, Stevens CV, Van Geem KM. Geometric deep learning for molecular property predictions with chemical accuracy across chemical space. J Cheminform 2024; 16:99. [PMID: 39138560 PMCID: PMC11323398 DOI: 10.1186/s13321-024-00895-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 08/06/2024] [Indexed: 08/15/2024] Open
Abstract
Chemical engineers heavily rely on precise knowledge of physicochemical properties to model chemical processes. Despite the growing popularity of deep learning, it is only rarely applied for property prediction due to data scarcity and limited accuracy for compounds in industrially-relevant areas of the chemical space. Herein, we present a geometric deep learning framework for predicting gas- and liquid-phase properties based on novel quantum chemical datasets comprising 124,000 molecules. Our findings reveal that the necessity for quantum-chemical information in deep learning models varies significantly depending on the modeled physicochemical property. Specifically, our top-performing geometric model meets the most stringent criteria for "chemically accurate" thermochemistry predictions. We also show that by carefully selecting the appropriate model featurization and evaluating prediction uncertainties, the reliability of the predictions can be strongly enhanced. These insights represent a crucial step towards establishing deep learning as the standard property prediction workflow in both industry and academia.Scientific contributionWe propose a flexible property prediction tool that can handle two-dimensional and three-dimensional molecular information. A thermochemistry prediction methodology that achieves high-level quantum chemistry accuracy for a broad application range is presented. Trained deep learning models and large novel molecular databases of real-world molecules are provided to offer a directly usable and fast property prediction solution to practitioners.
Collapse
Affiliation(s)
- Maarten R Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
| | - István Lengyel
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
- ChemInsights LLC, Dover, DE, 19901, USA
| | - Christian V Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium.
| |
Collapse
|
2
|
Alshehri AS, Horstmann KA, You F. Versatile Deep Learning Pipeline for Transferable Chemical Data Extraction. J Chem Inf Model 2024; 64:5888-5899. [PMID: 39009039 DOI: 10.1021/acs.jcim.4c00816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Chemical information disseminated in scientific documents offers an untapped potential for deep learning-assisted insights and breakthroughs. Automated extraction efforts have shifted from resource-intensive manual extraction toward applying machine learning methods to streamline chemical data extraction. While current extraction models and pipelines have ushered in notable efficiency improvements, they often exhibit modest performance, compromising the accuracy of predictive models trained on extracted data. Further, current chemical pipelines lack both transferability─where a model trained on one task can be adapted to another relevant task with limited examples─and extensibility, which enables seamless adaptability for new extraction tasks. Addressing these gaps, we present ChemREL, a versatile chemical data extraction pipeline emphasizing performance, transferability, and extensibility. ChemREL utilizes a custom, diverse data set of chemical documents, labeled through an active learning strategy to extract two properties: normal melting point and lethal dose 50 (LD50). The normal melting point is selected for its prevalence in diverse contexts and wider literature, serving as the foundation for pipeline training. In contrast, LD50 evaluates the pipeline's transferability to an unrelated property, underscoring variance in its biological nature, toxicological context, and units, among other differences. With pretraining and fine-tuning, our pipeline outperforms existing methods and GPT-4, achieving F1-scores of 96.1% for entity identification and 97.0% for relation mapping, culminating in an overall F1-score of 95.4%. More importantly, ChemREL displays high transferability, effectively transitioning from melting point extraction to LD50 extraction with 10 randomly selected training documents. Released as an open-source package, ChemREL aims to broaden access to chemical data extraction, enabling the construction of expansive relational data sets that propel discovery.
Collapse
Affiliation(s)
- Abdulelah S Alshehri
- Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York 14853, United States
- Department of Chemical Engineering, College of Engineering, King Saud University, Riyadh 11421, Saudi Arabia
| | - Kai A Horstmann
- Department of Computer Science, Cornell University, Ithaca, New York 14853, United States
| | - Fengqi You
- Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York 14853, United States
| |
Collapse
|
3
|
Huang CH, Lin ST. MARS Plus: An Improved Molecular Design Tool for Complex Compounds Involving Ionic, Stereo, and Cis-Trans Isomeric Structures. J Chem Inf Model 2023; 63:7711-7728. [PMID: 38100117 DOI: 10.1021/acs.jcim.3c01745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
MARS (Molecular Assembling and Representation Suite) (Hsu et al. J. Chem. Inf. Model. 2019, 59, 3703-3713) is a toolbox for the molecular design of organic molecules. MARS uses integer arrays to represent the elements and connectivity between elements of a molecule. It provides a collection of operations to manipulate the elemental composition and connectivity of a molecule (or a pair of molecules), enabling the creation of novel chemical compounds. In this work, the original MARS is extended to handle complex molecular structures, including geometric (cis-trans) isomers, stereo isomers, cyclic compounds, and ionic species. The extended version of MARS, referred to as MARS+, has a more comprehensive coverage of the chemical space and therefore can explore molecules with a greater chemical and physical diversity. Compared to other molecular design tools, MARS+ is designed to perform all possible manipulations on a given molecule or a pair of molecules. Molecular structure manipulation can be conducted in either a controlled or a random fashion. Furthermore, every structure manipulation has a counterpart so that the operation can be reversed. Nearly any possible chemical structure can be generated with MARS+ via a combination of molecular operations. The capabilities of MARS+ are examined by the design of new ionic liquids (ILs). The results show that MARS+ is a useful tool for computer-aided molecular design (CAMD) and molecular structure enumeration.
Collapse
Affiliation(s)
- Chen-Hsuan Huang
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Shiang-Tai Lin
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| |
Collapse
|
4
|
Fromer JC, Coley CW. Computer-aided multi-objective optimization in small molecule discovery. PATTERNS (NEW YORK, N.Y.) 2023; 4:100678. [PMID: 36873904 PMCID: PMC9982302 DOI: 10.1016/j.patter.2023.100678] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Molecular discovery is a multi-objective optimization problem that requires identifying a molecule or set of molecules that balance multiple, often competing, properties. Multi-objective molecular design is commonly addressed by combining properties of interest into a single objective function using scalarization, which imposes assumptions about relative importance and uncovers little about the trade-offs between objectives. In contrast to scalarization, Pareto optimization does not require knowledge of relative importance and reveals the trade-offs between objectives. However, it introduces additional considerations in algorithm design. In this review, we describe pool-based and de novo generative approaches to multi-objective molecular discovery with a focus on Pareto optimization algorithms. We show how pool-based molecular discovery is a relatively direct extension of multi-objective Bayesian optimization and how the plethora of different generative models extend from single-objective to multi-objective optimization in similar ways using non-dominated sorting in the reward function (reinforcement learning) or to select molecules for retraining (distribution learning) or propagation (genetic algorithms). Finally, we discuss some remaining challenges and opportunities in the field, emphasizing the opportunity to adopt Bayesian optimization techniques into multi-objective de novo design.
Collapse
Affiliation(s)
- Jenna C Fromer
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA.,Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139, USA
| |
Collapse
|
5
|
Folch JP, Lee RM, Shafei B, Walz D, Tsay C, van der Wilk M, Misener R. Combining multi-fidelity modelling and asynchronous batch Bayesian Optimization. Comput Chem Eng 2023. [DOI: 10.1016/j.compchemeng.2023.108194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
|
6
|
Iftakher A, Monjur MS, Hasan MMF. An Overview of Computer‐aided Molecular and Process Design. CHEM-ING-TECH 2023. [DOI: 10.1002/cite.202200172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Ashfaq Iftakher
- Texas A&M University Artie McFerrin Department of Chemical Engineering 100 Spence St. TX 77843-3122 College Station USA
| | - Mohammed Sadaf Monjur
- Texas A&M University Artie McFerrin Department of Chemical Engineering 100 Spence St. TX 77843-3122 College Station USA
| | - M. M. Faruque Hasan
- Texas A&M University Artie McFerrin Department of Chemical Engineering 100 Spence St. TX 77843-3122 College Station USA
| |
Collapse
|
7
|
Bestwick T, Beckmann J, Camarda KV. Using Artificial Neural Networks to Predict Physical Properties of Membrane Polymers. CHEM-ING-TECH 2022. [DOI: 10.1002/cite.202200102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Affiliation(s)
- Tate Bestwick
- University of Kansas Department of Chemical and Petroleum Engineering 1530 West 15th Street KS 66045 Lawrence United States
| | - Jessica Beckmann
- University of Kansas Department of Chemical and Petroleum Engineering 1530 West 15th Street KS 66045 Lawrence United States
| | - Kyle V. Camarda
- University of Kansas Department of Chemical and Petroleum Engineering 1530 West 15th Street KS 66045 Lawrence United States
| |
Collapse
|
8
|
Zhao L, Zhang Q, He C, Chen Q, Zhang BJ. Quantitative Structure-Property Relationship Analysis for the Prediction of Propylene Adsorption Capacity in Pure Silicon Zeolites at Various Pressure Levels. ACS OMEGA 2022; 7:33895-33907. [PMID: 36188274 PMCID: PMC9520561 DOI: 10.1021/acsomega.2c02779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 08/31/2022] [Indexed: 06/16/2023]
Abstract
This work is devoted to the development of quantitative structure-property relationship (QSPR) models using various regression analyses to predict propylene (C3H6) adsorption capacity at various pressures in zeolites from a topologically diverse International Zeolite Association database. Based on univariate and multilinear regression analysis, the accessible volume and largest cavity diameter are the most crucial factors determining C3H6 uptake at high and low pressures, respectively. An artificial neural network (ANN) model with five structural descriptors is sufficient to predict C3H6 uptake at high pressures. For combined pressures, the prediction of an ANN model with pore size distribution is pleasing. The isosteric heat of adsorption (Q st) has a significant impact on the improvement of the prediction of low-pressure gas adsorption, which finely classifies zeolites into high or low C3H6 adsorbers. The conjunction of high-throughput screening and QSPR models contributes to being able to prescreen the database rapidly and accurately for top performers and perform further detailed and time-consuming computational-intensive molecular simulations on these candidates for other gas adsorption applications.
Collapse
|
9
|
Challenges and Opportunities in Carbon Capture, Utilization and Storage: A Process Systems Engineering Perspective. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2022.107925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
10
|
Andersson MP, Jones MN, Mikkelsen KV, You F, Mansouri SS. Quantum computing for chemical and biomolecular product design. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100754] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
11
|
|
12
|
Identification of optimal metal-organic frameworks by machine learning: Structure decomposition, feature integration, and predictive modeling. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2022.107739] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
13
|
Bilodeau C, Jin W, Jaakkola T, Barzilay R, Jensen KF. Generative models for molecular discovery: Recent advances and challenges. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1608] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Camille Bilodeau
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Wengong Jin
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Tommi Jaakkola
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge Massachusetts USA
| |
Collapse
|
14
|
Gandhi A, Hasan MMF. Machine learning for the design and discovery of zeolites and porous crystalline materials. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100739] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
15
|
Chen Y, Peng B, Kontogeorgis GM, Liang X. Machine learning for the prediction of viscosity of ionic liquid–water mixtures. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.118546] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
16
|
|
17
|
Austin ND. The case for a common software library and a set of enumerated benchmark problems in computer-aided molecular design. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
18
|
|
19
|
Bernal DE, Ajagekar A, Harwood SM, Stober ST, Trenev D, You F. Perspectives of Quantum Computing for Chemical Engineering. AIChE J 2022. [DOI: 10.1002/aic.17651] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- David E. Bernal
- Research Institute for Advanced Computer Science Universities Space Research Association Mountain View California USA
- Quantum Artificial Intelligence Laboratory (QuAIL) NASA Ames Research Center Moffett Field California USA
- Department of Chemical Engineering Carnegie Mellon University Pittsburgh Pennsylvania USA
| | | | - Stuart M. Harwood
- Corporate Strategic Research ExxonMobil Research and Engineering Clinton New Jersey USA
| | - Spencer T. Stober
- Corporate Strategic Research ExxonMobil Research and Engineering Clinton New Jersey USA
| | - Dimitar Trenev
- Corporate Strategic Research ExxonMobil Research and Engineering Clinton New Jersey USA
| | - Fengqi You
- Systems Engineering Cornell University New York USA
- Robert Frederick Smith School of Chemical and Biomolecular Engineering Cornell University New York USA
| |
Collapse
|
20
|
Rivera Gil JL, Serna J, Arrieta‐Escobar JA, Narváez Rincón PC, Boly V, Falk V. Triggers for Chemical Product Design: A Systematic Literature Review. AIChE J 2022. [DOI: 10.1002/aic.17563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Jose Luis Rivera Gil
- Équipe de Recherche sur les Processus Innovatifs, ERPI‐ENSGSI Université de Lorraine Nancy Cedex France
- Grupo de investigación en Procesos Químicos y Bioquímicos, Departamento de Ingeniería Química y Ambiental Universidad Nacional de Colombia—Sede Bogotá Bogotá Colombia
| | - Juliana Serna
- Équipe de Recherche sur les Processus Innovatifs, ERPI‐ENSGSI Université de Lorraine Nancy Cedex France
- Grupo de investigación en Procesos Químicos y Bioquímicos, Departamento de Ingeniería Química y Ambiental Universidad Nacional de Colombia—Sede Bogotá Bogotá Colombia
| | - Javier A. Arrieta‐Escobar
- Grupo de investigación en Procesos Químicos y Bioquímicos, Departamento de Ingeniería Química y Ambiental Universidad Nacional de Colombia—Sede Bogotá Bogotá Colombia
- Laboratoire Réactions et Génie des Procédés CNRS‐Université de Lorraine Nancy Cedex France
| | - Paulo César Narváez Rincón
- Grupo de investigación en Procesos Químicos y Bioquímicos, Departamento de Ingeniería Química y Ambiental Universidad Nacional de Colombia—Sede Bogotá Bogotá Colombia
| | - Vincent Boly
- Équipe de Recherche sur les Processus Innovatifs, ERPI‐ENSGSI Université de Lorraine Nancy Cedex France
| | - Veronique Falk
- Laboratoire Réactions et Génie des Procédés CNRS‐Université de Lorraine Nancy Cedex France
| |
Collapse
|
21
|
Valencia-Marquez D, Flores-Tlacuahuac A, García-Cuéllar AJ, Ricardez-Sandoval L. Computer aided molecular design coupled with molecular dynamics as a novel approach to design new lubricants. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2021.107523] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
22
|
Ooi YJ, Aung KNG, Chong JW, Tan RR, Aviso KB, Chemmangattuvalappil NG. Design of fragrance molecules using computer-aided molecular design with machine learning. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2021.107585] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
23
|
Solvent pre-selection for extractive distillation using infinite dilution activity coefficients and the three-component Margules equation. Sep Purif Technol 2021. [DOI: 10.1016/j.seppur.2021.119230] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
24
|
Xu J, Du W, Xu Q, Dong J, Wang B. Federated learning based atmospheric source term estimation in urban environments. Comput Chem Eng 2021. [DOI: 10.1016/j.compchemeng.2021.107505] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
25
|
Gao P, Yang X, Tang YH, Zheng M, Andersen A, Murugesan V, Hollas A, Wang W. Graphical Gaussian process regression model for aqueous solvation free energy prediction of organic molecules in redox flow batteries. Phys Chem Chem Phys 2021; 23:24892-24904. [PMID: 34724700 DOI: 10.1039/d1cp04475c] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The solvation free energy of organic molecules is a critical parameter in determining emergent properties such as solubility, liquid-phase equilibrium constants, pKa and redox potentials in an organic redox flow battery. In this work, we present a machine learning (ML) model that can learn and predict the aqueous solvation free energy of an organic molecule using the Gaussian process regression method based on a new molecular graph kernel. To investigate the performance of the ML model for electrostatic interaction, the nonpolar interaction contribution of the solvent and the conformational entropy of the solute in the solvation free energy, three data sets with implicit or explicit water solvent models, and contribution of the conformational entropy of the solute are tested. We demonstrate that our ML model can predict the solvation free energy of molecules at chemical accuracy with a mean absolute error of less than 1 kcal mol-1 for subsets of the QM9 dataset and the Freesolv database. To solve the general data scarcity problem for a graph-based ML model, we propose a dimension reduction algorithm based on the distance between molecular graphs, which can be used to examine the diversity of the molecular data set. It provides a promising way to build a minimum training set to improve prediction for certain test sets where the space of molecular structures is predetermined.
Collapse
Affiliation(s)
- Peiyuan Gao
- Pacific Northwest National Laboratory, Richland 99352, USA.
| | - Xiu Yang
- Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA 18015, USA.
| | - Yu-Hang Tang
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Muqing Zheng
- Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA 18015, USA.
| | - Amity Andersen
- Pacific Northwest National Laboratory, Richland 99352, USA.
| | | | - Aaron Hollas
- Pacific Northwest National Laboratory, Richland 99352, USA.
| | - Wei Wang
- Pacific Northwest National Laboratory, Richland 99352, USA.
| |
Collapse
|
26
|
Chen Y, Meng X, Cai Y, Liang X, Kontogeorgis GM. Optimal Aqueous Biphasic Systems Design for the Recovery of Ionic Liquids. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.1c03341] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Yuqiu Chen
- Department of Chemical and Biochemical Engineering, Technical University of Denmark DK-2800 Lyngby, Denmark
| | - Xianglei Meng
- Beijing Key Laboratory of Ionic Liquids Clean Process, CAS Key Laboratory of Green Process and Engineering, State Key Laboratory of Multiphase ComplexSystems, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
| | - Yingjun Cai
- Beijing Key Laboratory of Ionic Liquids Clean Process, CAS Key Laboratory of Green Process and Engineering, State Key Laboratory of Multiphase ComplexSystems, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
| | - Xiaodong Liang
- Department of Chemical and Biochemical Engineering, Technical University of Denmark DK-2800 Lyngby, Denmark
| | - Georgios M. Kontogeorgis
- Department of Chemical and Biochemical Engineering, Technical University of Denmark DK-2800 Lyngby, Denmark
| |
Collapse
|
27
|
Alshehri AS, Tula AK, You F, Gani R. Next generation pure component property estimation models: With and without machine learning techniques. AIChE J 2021. [DOI: 10.1002/aic.17469] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Abdulelah S. Alshehri
- Robert Frederick Smith School of Chemical and Biomolecular Engineering Cornell University Ithaca New York USA
- Department of Chemical Engineering, College of Engineering King Saud University Riyadh Saudi Arabia
| | - Anjan K. Tula
- College of Control Science and Engineering Zhejiang University Hangzhou China
| | - Fengqi You
- Robert Frederick Smith School of Chemical and Biomolecular Engineering Cornell University Ithaca New York USA
| | - Rafiqul Gani
- Department of Chemical and Biomolecular Engineering Korea Advanced Institute of Science and Technology (KAIST) Daejeon South Korea
- PSE for SPEED Company Skyttemosen 6 DK_3450 Allerod Denmark
| |
Collapse
|
28
|
Machine Learning in Chemical Product Engineering: The State of the Art and a Guide for Newcomers. Processes (Basel) 2021. [DOI: 10.3390/pr9081456] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Chemical Product Engineering (CPE) is marked by numerous challenges, such as the complexity of the properties–structure–ingredients–process relationship of the different products and the necessity to discover and develop constantly and quickly new molecules and materials with tailor-made properties. In recent years, artificial intelligence (AI) and machine learning (ML) methods have gained increasing attention due to their performance in tackling particularly complex problems in various areas, such as computer vision and natural language processing. As such, they present a specific interest in addressing the complex challenges of CPE. This article provides an updated review of the state of the art regarding the implementation of ML techniques in different types of CPE problems with a particular focus on four specific domains, namely the design and discovery of new molecules and materials, the modeling of processes, the prediction of chemical reactions/retrosynthesis and the support for sensorial analysis. This review is further completed by general guidelines for the selection of an appropriate ML technique given the characteristics of each problem and by a critical discussion of several key issues associated with the development of ML modeling approaches. Accordingly, this paper may serve both the experienced researcher in the field as well as the newcomer.
Collapse
|
29
|
Zhumagambetov R, Molnár F, Peshkov VA, Fazli S. Transmol: repurposing a language model for molecular generation. RSC Adv 2021; 11:25921-25932. [PMID: 35479483 PMCID: PMC9037129 DOI: 10.1039/d1ra03086h] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 07/22/2021] [Indexed: 12/29/2022] Open
Abstract
Recent advances in convolutional neural networks have inspired the application of deep learning to other disciplines. Even though image processing and natural language processing have turned out to be the most successful, there are many other domains that have also benefited; among them, life sciences in general and chemistry and drug design in particular. In concordance with this observation, from 2018 the scientific community has seen a surge of methodologies related to the generation of diverse molecular libraries using machine learning. However to date, attention mechanisms have not been employed for the problem of de novo molecular generation. Here we employ a variant of transformers, an architecture recently developed for natural language processing, for this purpose. Our results indicate that the adapted Transmol model is indeed applicable for the task of generating molecular libraries and leads to statistically significant increases in some of the core metrics of the MOSES benchmark. The presented model can be tuned to either input-guided or diversity-driven generation modes by applying a standard one-seed and a novel two-seed approach, respectively. Accordingly, the one-seed approach is best suited for the targeted generation of focused libraries composed of close analogues of the seed structure, while the two-seeds approach allows us to dive deeper into under-explored regions of the chemical space by attempting to generate the molecules that resemble both seeds. To gain more insights about the scope of the one-seed approach, we devised a new validation workflow that involves the recreation of known ligands for an important biological target vitamin D receptor. To further benefit the chemical community, the Transmol algorithm has been incorporated into our cheML.io web database of ML-generated molecules as a second generation on-demand methodology.
Collapse
Affiliation(s)
- Rustam Zhumagambetov
- Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University Nur-Sultan Kazakhstan
| | - Ferdinand Molnár
- Department of Biology, School of Sciences and Humanities, Nazarbayev University Nur-Sultan Kazakhstan
| | - Vsevolod A Peshkov
- Department of Chemistry, School of Sciences and Humanities, Nazarbayev University Nur-Sultan Kazakhstan
| | - Siamac Fazli
- Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University Nur-Sultan Kazakhstan
| |
Collapse
|
30
|
Zhang X, Wang J, Song Z, Zhou T. Data-Driven Ionic Liquid Design for CO 2 Capture: Molecular Structure Optimization and DFT Verification. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.1c01384] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Xiang Zhang
- Process Systems Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, Magdeburg D-39106, Germany
| | - Jingwen Wang
- Academy of Building Energy Efficiency, School of Civil Engineering, Guangzhou University, Guangzhou 510006, China
| | - Zhen Song
- Process Systems Engineering, Otto-von-Guericke University Magdeburg, Universitätsplatz 2, Magdeburg D-39106, Germany
| | - Teng Zhou
- Process Systems Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, Magdeburg D-39106, Germany
- Process Systems Engineering, Otto-von-Guericke University Magdeburg, Universitätsplatz 2, Magdeburg D-39106, Germany
| |
Collapse
|
31
|
Alshehri AS, You F. Paradigm Shift: The Promise of Deep Learning in Molecular Systems Engineering and Design. FRONTIERS IN CHEMICAL ENGINEERING 2021. [DOI: 10.3389/fceng.2021.700717] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The application of deep learning to a diverse array of research problems has accelerated progress across many fields, bringing conventional paradigms to a new intelligent era. Just as the roles of instrumentation in the old chemical revolutions, we reinforce the necessity for integrating deep learning in molecular systems engineering and design as a transformative catalyst towards the next chemical revolution. To meet such research needs, we summarize advances and progress across several key elements of molecular systems: molecular representation, property estimation, representation learning, and synthesis planning. We further spotlight recent advances and promising directions for several deep learning architectures, methods, and optimization platforms. Our perspective is of interest to both computational and experimental researchers as it aims to chart a path forward for cross-disciplinary collaborations on synthesizing knowledge from available chemical data and guiding experimental efforts.
Collapse
|
32
|
Spencer R, Gkinis P, Koronaki E, Gerogiorgis D, Bordas S, Boudouvis A. Investigation of the chemical vapor deposition of Cu from copper amidinate through data driven efficient CFD modelling. Comput Chem Eng 2021. [DOI: 10.1016/j.compchemeng.2021.107289] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
33
|
Mistry A, Franco AA, Cooper SJ, Roberts SA, Viswanathan V. How Machine Learning Will Revolutionize Electrochemical Sciences. ACS ENERGY LETTERS 2021; 6:1422-1431. [PMID: 33869772 PMCID: PMC8042659 DOI: 10.1021/acsenergylett.1c00194] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 03/08/2021] [Indexed: 05/21/2023]
Abstract
Electrochemical systems function via interconversion of electric charge and chemical species and represent promising technologies for our cleaner, more sustainable future. However, their development time is fundamentally limited by our ability to identify new materials and understand their electrochemical response. To shorten this time frame, we need to switch from the trial-and-error approach of finding useful materials to a more selective process by leveraging model predictions. Machine learning (ML) offers data-driven predictions and can be helpful. Herein we ask if ML can revolutionize the development cycle from decades to a few years. We outline the necessary characteristics of such ML implementations. Instead of enumerating various ML algorithms, we discuss scientific questions about the electrochemical systems to which ML can contribute.
Collapse
Affiliation(s)
- Aashutosh Mistry
- Chemical
Sciences and Engineering Division, Argonne
National Laboratory, Lemont, Illinois 60439, United States
| | - Alejandro A. Franco
- Laboratorie
de Réactivité et Chimie des Solides (LRCS), UMR CNRS
7314, Université de Picardie Jules Verne, Hub de I’Energie, 15 rue Baudelocque, 80039 Amiens Cedex, France
- Réseau
sur le Stockage Electrochimique de l’Energie (RS2E), FR CNRS
3459, Hub de l’Energie, 15 rue Baudelocque, 80039 Amiens Cedex, France
- ALISTORE-European
Research Institute, FR CNRS 3104, Hub de l’Energie, 15 rue Baudelocque, 80039 Amiens Cedex, France
- Institut
Universitaire de France, 103 Boulevard Saint Michel, 75005 Paris, France
| | - Samuel J. Cooper
- Dyson
School of Design Engineering, Imperial College
London, London SW7 2DB, United Kingdom
| | - Scott A. Roberts
- Engineering
Sciences Center, Sandia National Laboratories, Albuquerque, New Mexico 87185, United States
| | | |
Collapse
|
34
|
Pistikopoulos EN, Barbosa-Povoa A, Lee JH, Misener R, Mitsos A, Reklaitis GV, Venkatasubramanian V, You F, Gani R. Process systems engineering – The generation next? Comput Chem Eng 2021. [DOI: 10.1016/j.compchemeng.2021.107252] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
35
|
Adjiman CS, Sahinidis NV, Vlachos DG, Bakshi B, Maravelias CT, Georgakis C. Process Systems Engineering Perspective on the Design of Materials and Molecules. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.0c05399] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Claire S. Adjiman
- Department of Chemical Engineering, Centre for Process Systems Engineering and Institute for Molecular Science and Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, U.K
| | - Nikolaos V. Sahinidis
- H. Milton Stewart School of Industrial & Systems Engineering and School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Dionisios G. Vlachos
- Department of Chemical and Biomolecular Engineering, Catalysis Center for Energy Innovation, RAPID Manufacturing Institute, and Delaware Energy Institute (DEI), University of Delaware, Newark, Delaware 19716, United States
| | - Bhavik Bakshi
- Lowrie Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
| | - Christos T. Maravelias
- Department of Chemical & Biological Engineering and Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States
| | - Christos Georgakis
- Department of Chemical and Biological Engineering Systems Research Institute of Chemical and Biological Processes, Tufts University, Medford, Massachusetts 02155, United States
| |
Collapse
|
36
|
Chai S, Zhang L, Du J, Tula AK, Gani R, Eden MR. A Versatile Modeling Framework for Integrated Chemical Product Design. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.0c04415] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Shiyang Chai
- Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China
| | - Lei Zhang
- Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China
| | - Jian Du
- Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China
| | - Anjan K. Tula
- College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
| | - Rafiqul Gani
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea 34141, United States
- PSE for SPEED Company, Skyttemosen 6, DK-3450 Allerød, Denmark
| | - Mario R. Eden
- Department of Chemical Engineering, Auburn University, Auburn, Alabama 36849, United States
| |
Collapse
|
37
|
Kell DB, Samanta S, Swainston N. Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently. Biochem J 2020; 477:4559-4580. [PMID: 33290527 PMCID: PMC7733676 DOI: 10.1042/bcj20200781] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 11/11/2020] [Accepted: 11/12/2020] [Indexed: 12/15/2022]
Abstract
The number of 'small' molecules that may be of interest to chemical biologists - chemical space - is enormous, but the fraction that have ever been made is tiny. Most strategies are discriminative, i.e. have involved 'forward' problems (have molecule, establish properties). However, we normally wish to solve the much harder generative or inverse problem (describe desired properties, find molecule). 'Deep' (machine) learning based on large-scale neural networks underpins technologies such as computer vision, natural language processing, driverless cars, and world-leading performance in games such as Go; it can also be applied to the solution of inverse problems in chemical biology. In particular, recent developments in deep learning admit the in silico generation of candidate molecular structures and the prediction of their properties, thereby allowing one to navigate (bio)chemical space intelligently. These methods are revolutionary but require an understanding of both (bio)chemistry and computer science to be exploited to best advantage. We give a high-level (non-mathematical) background to the deep learning revolution, and set out the crucial issue for chemical biology and informatics as a two-way mapping from the discrete nature of individual molecules to the continuous but high-dimensional latent representation that may best reflect chemical space. A variety of architectures can do this; we focus on a particular type known as variational autoencoders. We then provide some examples of recent successes of these kinds of approach, and a look towards the future.
Collapse
Affiliation(s)
- Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
| |
Collapse
|
38
|
Gertig C, Fleitmann L, Schilling J, Leonhard K, Bardow A. Rx‐COSMO‐CAMPD: Enhancing Reactions by Integrated Computer‐Aided Design of Solvents and Processes based on Quantum Chemistry. CHEM-ING-TECH 2020. [DOI: 10.1002/cite.202000112] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Christoph Gertig
- RWTH Aachen University Institute of Technical Thermodynamics Schinkelstraße 8 52062 Aachen Germany
| | - Lorenz Fleitmann
- RWTH Aachen University Institute of Technical Thermodynamics Schinkelstraße 8 52062 Aachen Germany
| | - Johannes Schilling
- RWTH Aachen University Institute of Technical Thermodynamics Schinkelstraße 8 52062 Aachen Germany
| | - Kai Leonhard
- RWTH Aachen University Institute of Technical Thermodynamics Schinkelstraße 8 52062 Aachen Germany
| | - André Bardow
- RWTH Aachen University Institute of Technical Thermodynamics Schinkelstraße 8 52062 Aachen Germany
- Forschungszentrum Jülich GmbH Institute of Energy and Climate Research – Energy Systems Engineering (IEK-10) Wilhelm-Johnen-Straße 52425 Jülich Germany
- ETH Zurich Department of Mechanical and Process Engineering, Energy & Process Systems Engineering Tannenstrasse 3 8092 Zürich Switzerland
| |
Collapse
|