1
|
Sridharan B, Sinha A, Bardhan J, Modee R, Ehara M, Priyakumar UD. Deep reinforcement learning in chemistry: A review. J Comput Chem 2024; 45:1886-1898. [PMID: 38698628 DOI: 10.1002/jcc.27354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/17/2024] [Accepted: 03/20/2024] [Indexed: 05/05/2024]
Abstract
Reinforcement learning (RL) has been applied to various domains in computational chemistry and has found wide-spread success. In this review, we first motivate the application of RL to chemistry and list some broad application domains, for example, molecule generation, geometry optimization, and retrosynthetic pathway search. We set up some of the formalism associated with reinforcement learning that should help the reader translate their chemistry problems into a form where RL can be used to solve them. We then discuss the solution formulations and algorithms proposed in recent literature for these problems, the advantages of one over the other, together with the necessary details of the RL algorithms they employ. This article should help the reader understand the state of RL applications in chemistry, learn about some relevant actively-researched open problems, gain insight into how RL can be used to approach them and hopefully inspire innovative RL applications in Chemistry.
Collapse
Affiliation(s)
- Bhuvanesh Sridharan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Animesh Sinha
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Jai Bardhan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Rohit Modee
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Masahiro Ehara
- Research Center for Computational Science, Institute for Molecular Science, Okazaki, Japan
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| |
Collapse
|
2
|
Kensert A, Libin P, Desmet G, Cabooter D. Deep reinforcement learning for the direct optimization of gradient separations in liquid chromatography. J Chromatogr A 2024; 1720:464768. [PMID: 38442496 DOI: 10.1016/j.chroma.2024.464768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 02/20/2024] [Accepted: 02/22/2024] [Indexed: 03/07/2024]
Abstract
While Reinforcement Learning (RL) has already proven successful in performing complex tasks, such as controlling large-scale epidemics, mitigating influenza and playing computer games beyond expert level, it is currently largely unexplored in the field of separation sciences. This paper therefore aims to introduce RL, specifically proximal policy optimization (PPO), in liquid chromatography, and evaluate whether it can be trained to optimize separations directly, based solely on the outcome of a single generic separation as input, and a reward signal based on the resolution between peak pairs (taking a value between [-1,1]). More specifically, PPO algorithms or agents were trained to select linear (1-segment) or multi-segment (2-, 3-, or 16-segment) gradients in 1 experiment, based on the outcome of an initial, generic linear gradient (ϕstart=0.3, ϕend=1.0, and tg=20min), to improve separations. The size of the mixtures to be separated varied between 10 and 20 components. Furthermore, two agents, selecting 16-segment gradients, were trained to perform this optimization using either 2 or 3 experiments, in sequence, to investigate whether the agents could improve separations further, based on previous outcomes. Results showed that the PPO agent can improve separations given the outcome of one generic scouting run as input, by selecting ϕ-programs tailored to the mixture under consideration. Allowing agents more freedom in selecting multi-segment gradients increased the reward from 0.891 to 0.908 on average; and allowing the agents to perform an additional experiment increased the reward from 0.908 to 0.918 on average. Finally, the agent outperformed random experiments as well as standard experiments (ϕstart=0.0, ϕend=1.0, and tg=20min) significantly; as random experiments resulted in average rewards between 0.220 and 0.283, and standard experiments resulted in average rewards of 0.840. In conclusion, while there is room for improvement, the results demonstrate the potential of RL in chromatography and present an interesting future direction for the automated optimization of separations.
Collapse
Affiliation(s)
- Alexander Kensert
- University of Leuven (KU Leuven), Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, Herestraat 49, 3000 Leuven, Belgium
| | - Pieter Libin
- Vrije Universiteit Brussel, Department of Computer Science, Artificial Intelligence Laboratory, Pleinlaan 9, 1050 Brussel, Belgium
| | - Gert Desmet
- Vrije Universiteit Brussel, Department of Chemical Engineering, Pleinlaan 2, 1050 Brussel, Belgium
| | - Deirdre Cabooter
- University of Leuven (KU Leuven), Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, Herestraat 49, 3000 Leuven, Belgium.
| |
Collapse
|
3
|
Bosten E, Kensert A, Desmet G, Cabooter D. Automated method development in high-pressure liquid chromatography. J Chromatogr A 2024; 1714:464577. [PMID: 38104507 DOI: 10.1016/j.chroma.2023.464577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 12/08/2023] [Accepted: 12/11/2023] [Indexed: 12/19/2023]
Abstract
Method development in liquid chromatography is a crucial step in the optimization of analytical separations for various applications. However, it is often a challenging endeavour due to its time-consuming, resource intensive and costly nature, which is further hampered by its complexity requiring highly skilled and experienced scientists. This review presents an examination of the methods that are required for a completely automated method development procedure in liquid chromatography, aimed at taking the human out of the decision loop. Some of the presented approaches have recently witnessed an important increase in interest as they offer the promise to facilitate, streamline and speed up the method development process. The review first discusses the mathematical description of the separation problem by means of multi-criteria optimization functions. Two different strategies to resolve this optimization are then presented; an experimental and a model-based approach. Additionally, methods for automated peak detection and peak tracking are reviewed, which, upon integration in an instrument, allow for a completely closed-loop method development process. For each of these approaches, various currently applied methods are presented, recent trends and approaches discussed, short-comings pointed out, and future prospects highlighted.
Collapse
Affiliation(s)
- Emery Bosten
- Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, University of Leuven (KU Leuven), Herestraat 49, Leuven 3000, Belgium; Department of Pharmaceutical Development and Manufacturing Sciences, Janssen Pharmaceutica, Turnhoutseweg 30, Beerse, Belgium
| | - Alexander Kensert
- Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, University of Leuven (KU Leuven), Herestraat 49, Leuven 3000, Belgium
| | - Gert Desmet
- Department of Chemical Engineering, Free University of Brussels (VUB), Pleinlaan 2, Brussels 1050, Belgium
| | - Deirdre Cabooter
- Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, University of Leuven (KU Leuven), Herestraat 49, Leuven 3000, Belgium.
| |
Collapse
|
4
|
Kensert A, Desmet G, Cabooter D. A perspective on the use of deep deterministic policy gradient reinforcement learning for retention time modeling in reversed-phase liquid chromatography. J Chromatogr A 2024; 1713:464570. [PMID: 38101304 DOI: 10.1016/j.chroma.2023.464570] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/04/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023]
Abstract
Artificial intelligence and machine learning techniques are increasingly used for different tasks related to method development in liquid chromatography. In this study, the possibilities of a reinforcement learning algorithm, more specifically a deep deterministic policy gradient algorithm, are evaluated for the selection of scouting runs for retention time modeling. As a theoretical exercise, it is investigated whether such an algorithm can be trained to select scouting runs for any compound of interest allowing to retrieve its correct retention parameters for the three-parameter Neue-Kuss retention model. It is observed that three scouting runs are generally sufficient to retrieve the retention parameters with an accuracy (mean relative percentage error MRPE) of 1 % or less. When given the opportunity to select additional scouting runs, this does not lead to a significantly improved accuracy. It is also observed that the agent tends to give preference to isocratic scouting runs for retention time modeling, and is only motivated towards selecting gradient scouting runs when penalized (strongly) for large analysis/gradient times. This seems to reinforce the general power and usefulness of isocratic scouting runs for retention time modeling. Finally, the best results (lowest MRPE) are obtained when the agent manages to retrieve retention time data for % ACN at elution of the compound under consideration that spread the entire relevant range of ACN (5 % ACN to 95 % ACN) as well as possible, i.e., resulting in retention data at a low, intermediate and high % ACN. Based on the obtained results, we believe reinforcement learning holds great potential to automate and rationalize method development in liquid chromatography in the future.
Collapse
Affiliation(s)
- Alexander Kensert
- University of Leuven (KU Leuven), Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, Herestraat 49, 3000 Leuven, Belgium; Vrije Universiteit Brussel, Department of Chemical Engineering, Pleinlaan 2, 1050 Brussel, Belgium
| | - Gert Desmet
- Vrije Universiteit Brussel, Department of Chemical Engineering, Pleinlaan 2, 1050 Brussel, Belgium
| | - Deirdre Cabooter
- University of Leuven (KU Leuven), Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, Herestraat 49, 3000 Leuven, Belgium.
| |
Collapse
|
5
|
Helleckes LM, Hemmerich J, Wiechert W, von Lieres E, Grünberger A. Machine learning in bioprocess development: from promise to practice. Trends Biotechnol 2023; 41:817-835. [PMID: 36456404 DOI: 10.1016/j.tibtech.2022.10.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/20/2022] [Accepted: 10/27/2022] [Indexed: 11/30/2022]
Abstract
Fostered by novel analytical techniques, digitalization, and automation, modern bioprocess development provides large amounts of heterogeneous experimental data, containing valuable process information. In this context, data-driven methods like machine learning (ML) approaches have great potential to rationally explore large design spaces while exploiting experimental facilities most efficiently. Herein we demonstrate how ML methods have been applied so far in bioprocess development, especially in strain engineering and selection, bioprocess optimization, scale-up, monitoring, and control of bioprocesses. For each topic, we will highlight successful application cases, current challenges, and point out domains that can potentially benefit from technology transfer and further progress in the field of ML.
Collapse
Affiliation(s)
- Laura M Helleckes
- Institute for Bio- and Geosciences (IBG-1), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany; RWTH Aachen University, Templergraben 55, 52062 Aachen, Germany
| | - Johannes Hemmerich
- Institute for Bio- and Geosciences (IBG-1), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
| | - Wolfgang Wiechert
- Institute for Bio- and Geosciences (IBG-1), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany; RWTH Aachen University, Templergraben 55, 52062 Aachen, Germany
| | - Eric von Lieres
- Institute for Bio- and Geosciences (IBG-1), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany; RWTH Aachen University, Templergraben 55, 52062 Aachen, Germany
| | - Alexander Grünberger
- Multiscale Bioengineering, Technical Faculty, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany; Center for Biotechnology (CeBiTec), Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany; Institute of Process Engineering in Life Sciences, Section III: Microsystems in Bioprocess Engineering, Karlsruhe Institute of Technology, Fritz-Haber-Weg 2, 76131, Karlsruhe, Germany.
| |
Collapse
|
6
|
Jiang Q, Seth S, Scharl T, Schroeder T, Jungbauer A, Dimartino S. Prediction of the performance of pre-packed purification columns through machine learning. J Sep Sci 2022; 45:1445-1457. [PMID: 35262290 PMCID: PMC9310636 DOI: 10.1002/jssc.202100864] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 01/31/2022] [Accepted: 03/01/2022] [Indexed: 11/11/2022]
Abstract
Pre-packed columns have been increasingly used in process development and biomanufacturing thanks to their ease of use and consistency. Traditionally, packing quality is predicted through rate models, which require extensive calibration efforts through independent experiments to determine relevant mass transfer and kinetic rate constants. Here we propose machine learning as a complementary predictive tool for column performance. A machine learning algorithm, extreme gradient boosting, was applied to a large data set of packing quality (plate height and asymmetry) for pre-packed columns as a function of quantitative parameters (column length, column diameter, and particle size) and qualitative attributes (backbone and functional mode). The machine learning model offered excellent predictive capabilities for the plate height and the asymmetry (90 and 93%, respectively), with packing quality strongly influenced by backbone (∼70% relative importance) and functional mode (∼15% relative importance), well above all other quantitative column parameters. The results highlight the ability of machine learning to provide reliable predictions of column performance from simple, generic parameters, including strategic qualitative parameters such as backbone and functionality, usually excluded from quantitative considerations. Our results will guide further efforts in column optimization, for example, by focusing on improvements of backbone and functional mode to obtain optimized packings.
Collapse
Affiliation(s)
- Qihao Jiang
- Institute of BioengineeringSchool of EngineeringThe University of EdinburghEdinburghUK
| | - Sohan Seth
- School of InformaticsThe University of EdinburghEdinburghUK
| | - Theresa Scharl
- Austrian Centre of Industrial BiotechnologyViennaAustria
- Institute of StatisticsUniversity of Natural Resources and Life Sciences ViennaViennaAustria
| | | | - Alois Jungbauer
- Austrian Centre of Industrial BiotechnologyViennaAustria
- Department of BiotechnologyUniversity of Natural Resources and Life SciencesViennaAustria
| | - Simone Dimartino
- Institute of BioengineeringSchool of EngineeringThe University of EdinburghEdinburghUK
| |
Collapse
|
7
|
Brau T, Pirok B, Rutan S, Stoll D. Accuracy of retention model parameters obtained from retention data in liquid chromatography. J Sep Sci 2022; 45:3241-3255. [PMID: 35304809 DOI: 10.1002/jssc.202100911] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/02/2022] [Accepted: 03/14/2022] [Indexed: 11/10/2022]
Abstract
In liquid chromatography (LC), it is often very useful to have an accurate model of the retention factor, k, over a wide range of isocratic elution conditions. In principle, the parameters of a retention model can be obtained by fitting either isocratic or gradient retention factor data. However, in spite of many of our own attempts to accurately predict isocratic k values using retention models trained with gradient retention data, this has not worked in our hands. In the present study we have used synthetic isocratic and gradient retention data for small molecules under reversed-phase LC conditions. This allows us to discover challenges associated with predicting isocratic k's without the confounding influences of experimental issues that are difficult to model or eliminate. The results indicate that it is not currently possible to consistently predict isocratic retention factors for small molecules with accuracies better than 10%, even when using synthetic gradient retention data. Two distinct challenges in fitting gradient retention data were identified: 1) a lack of 'uniqueness' in the parameters; and 2) an inability to find the global optimum fit in a complex fitting landscape. Working with experimental data where measurement noise is unavoidable will only make the accuracy worse. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
| | - Bob Pirok
- Gustavus Adolphus College.,Van 't Hoff Institute for Molecular Sciences
| | | | | |
Collapse
|
8
|
Bayesian optimization of comprehensive two-dimensional liquid chromatography separations. J Chromatogr A 2021; 1659:462628. [PMID: 34731752 DOI: 10.1016/j.chroma.2021.462628] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 09/16/2021] [Accepted: 10/13/2021] [Indexed: 11/20/2022]
Abstract
Comprehensive two-dimensional liquid chromatography (LC×LC), is a powerful, emerging separation technique in analytical chemistry. However, as many instrumental parameters need to be tuned, the technique is troubled by lengthy method development. To speed up this process, we applied a Bayesian optimization algorithm. The algorithm can optimize LC×LC method parameters by maximizing a novel chromatographic response function based on the concept of connected components of a graph. The algorithm was benchmarked against a grid search (11,664 experiments) and a random search algorithm on the optimization of eight gradient parameters for four different samples of 50 compounds. The worst-case performance of the algorithm was investigated by repeating the optimization loop for 100 experiments with random starting experiments and seeds. Given an optimization budget of 100 experiments, the Bayesian optimization algorithm generally outperformed the random search and often improved upon the grid search. Moreover, the Bayesian optimization algorithm offered a considerably more sample-efficient alternative to grid searches, as it found similar optima to the grid search in far fewer experiments (a factor of 16-100 times less). This could likely be further improved by a more informed choice of the initialization experiments, which could be provided by the analyst's experience or smarter selection procedures. The algorithm allows for expansion to other method parameters (e.g., temperature, flow rate, etc.) and unlocks closed-loop automated method development.
Collapse
|
9
|
Gisbert-Alonso A, Navarro-Huerta JA, Torres-Lapasió JR, García-Alvarez-Coque MC. Testing experimental designs in liquid chromatography (II): Influence of the design geometry on the prediction performance of retention models. J Chromatogr A 2021; 1654:462458. [PMID: 34399141 DOI: 10.1016/j.chroma.2021.462458] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/26/2021] [Accepted: 08/02/2021] [Indexed: 11/25/2022]
Abstract
In liquid chromatography, the reliability of predictions carried out with retention models depends critically on the quality of the training experimental design. The search of the best design is more complex when gradient runs are used instead of isocratic experiments. In Part I of this work (JCA 1624 (2020) 461180), a general methodology based on the error propagation theory was developed and validated for assessing the quality of training designs involving gradients. The treatment relates the mathematical properties of a retention model with the geometry of the training designs and their subsequent predictions. In that work, only five usual designs were considered. Part II investigates in detail the effects on predictions when the features of the training design (number and distribution of the experiments, initial and final modifier content, gradient slope(s), and location of gradient nodes and pulses) are varied systematically. Several groups of related designs containing one or more isocratic steps, linear or multi-linear gradients, or mixed isocratic/gradient runs, among others (in total 38 designs) were evaluated. Box and whiskers and triple plots of expected relative uncertainties were used to evidence the differences in prediction performance. The purpose was to give recommendations to construct designs with good prediction performance. The best designs sample (considering all runs) concentrations as diverse as possible, at any gradient time.
Collapse
Affiliation(s)
- A Gisbert-Alonso
- Department of Analytical Chemistry, Faculty of Chemistry, Universitat de València, C/ Dr. Moliner 50, 46100 Burjassot, Spain
| | - J A Navarro-Huerta
- Department of Analytical Chemistry, Faculty of Chemistry, Universitat de València, C/ Dr. Moliner 50, 46100 Burjassot, Spain
| | - J R Torres-Lapasió
- Department of Analytical Chemistry, Faculty of Chemistry, Universitat de València, C/ Dr. Moliner 50, 46100 Burjassot, Spain.
| | - M C García-Alvarez-Coque
- Department of Analytical Chemistry, Faculty of Chemistry, Universitat de València, C/ Dr. Moliner 50, 46100 Burjassot, Spain
| |
Collapse
|
10
|
Comparison of the Fitting Performance of Retention Models and Elution Strength Behaviour in Hydrophilic-Interaction and Reversed-Phase Liquid Chromatography. SEPARATIONS 2021. [DOI: 10.3390/separations8040054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Hydrophilic interaction liquid chromatography (HILIC) is able to separate from polar to highly polar solutes, using similar eluents to those in the reversed-phase mode (RPLC) and a polar stationary phase, where water is adsorbed onto its surface. It is widely accepted that multiple modes of interaction take place in the HILIC environment, which can be far more complex than the interactions in an RPLC column. The behaviour in HILIC should be adequately modelled to predict the retention with optimisation purposes and improve the understanding on retention mechanisms, as is the case for RPLC. In this work, the prediction performance of several retention models is studied for seven HILIC columns (underivatised silica, and silica containing diol, amino and sulfobetaine functional groups, together with three columns recently manufactured with neutral, anionic, and cationic character), using uracil and six polar nucleosides (adenosine, cytidine, guanosine, thymidine, uridine, and xanthosine) as probe compounds. The results in HILIC are compared with those that were offered by the elution of several polar sulphonamides and diuretics analysed with two C18 columns (Chromolith Speed ROD and Zorbax Eclipse XDB). It is shown that eight retention models, which only consider partitioning or both partitioning and adsorption, give similar good accuracy in predictions for both HILIC and RPLC columns. However, the study on the elution strength behaviour, at varying mobile phase composition, reveals similarities (or differences) between RPLC and HILIC columns of diverse nature. The particular behaviour for the HILIC and RPLC columns was also revealed when the retention, in both modes, was fitted to a model that describes the change in the elution strength with the modifier concentration.
Collapse
|