1
|
Beck AG, Fine J, Aggarwal P, Regalado EL, Levorse D, De Jesus Silva J, Sherer EC. Machine learning models and performance dependency on 2D chemical descriptor space for retention time prediction of pharmaceuticals. J Chromatogr A 2024; 1730:465109. [PMID: 38968662 DOI: 10.1016/j.chroma.2024.465109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/07/2024]
Abstract
The predictive modeling of liquid chromatography methods can be an invaluable asset, potentially saving countless hours of labor while also reducing solvent consumption and waste. Tasks such as physicochemical screening and preliminary method screening systems where large amounts of chromatography data are collected from fast and routine operations are particularly well suited for both leveraging large datasets and benefiting from predictive models. Therefore, the generation of predictive models for retention time is an active area of development. However, for these predictive models to gain acceptance, researchers first must have confidence in model performance and the computational cost of building them should be minimal. In this study, a simple and cost-effective workflow for the development of machine learning models to predict retention time using only Molecular Operating Environment 2D descriptors as input for support vector regression is developed. Furthermore, we investigated the relative performance of models based on molecular descriptor space by utilizing uniform manifold approximation and projection and clustering with Gaussian mixture models to identify chemically distinct clusters. Results outlined herein demonstrate that local models trained on clusters in chemical space perform equivalently when compared to models trained on all data. Through 10-fold cross-validation on a comprehensive set containing 67,950 of our company's proprietary analytes, these models achieved coefficients of determination of 0.84 and 3 % error in terms of retention time. This promising statistical significance is found to translate from cross-validation to prospective prediction on an external test set of pharmaceutically relevant analytes. The observed equivalency of global and local modeling of large datasets is retained with METLIN's SMRT dataset, thereby confirming the wider applicability of the developed machine learning workflows for global models.
Collapse
Affiliation(s)
- Armen G Beck
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Jonathan Fine
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Pankaj Aggarwal
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
| | - Erik L Regalado
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Dorothy Levorse
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| | | | - Edward C Sherer
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
2
|
Towards a chromatographic similarity index to establish localised quantitative structure-retention relationships for retention prediction. II Use of Tanimoto similarity index in ion chromatography. J Chromatogr A 2017; 1523:173-182. [DOI: 10.1016/j.chroma.2017.02.054] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Revised: 02/20/2017] [Accepted: 02/23/2017] [Indexed: 11/19/2022]
|
3
|
Bauer KC, Hämmerling F, Kittelmann J, Dürr C, Görlich F, Hubbuch J. Influence of structure properties on protein-protein interactions-QSAR modeling of changes in diffusion coefficients. Biotechnol Bioeng 2017; 114:821-831. [PMID: 27801503 DOI: 10.1002/bit.26210] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Revised: 10/05/2016] [Accepted: 10/28/2016] [Indexed: 11/07/2022]
Abstract
Information about protein-protein interactions provides valuable knowledge about the phase behavior of protein solutions during the biopharmaceutical production process. Up to date it is possible to capture their overall impact by an experimentally determined potential of mean force. For the description of this potential, the second virial coefficient B22, the diffusion interaction parameter kD, the storage modulus G', or the diffusion coefficient D is applied. In silico methods do not only have the potential to predict these parameters, but also to provide deeper understanding of the molecular origin of the protein-protein interactions by correlating the data to the protein's three-dimensional structure. This methodology furthermore allows a lower sample consumption and less experimental effort. Of all in silico methods, QSAR modeling, which correlates the properties of the molecule's structure with the experimental behavior, seems to be particularly suitable for this purpose. To verify this, the study reported here dealt with the determination of a QSAR model for the diffusion coefficient of proteins. This model consisted of diffusion coefficients for six different model proteins at various pH values and NaCl concentrations. The generated QSAR model showed a good correlation between experimental and predicted data with a coefficient of determination R2 = 0.9 and a good predictability for an external test set with R2 = 0.91. The information about the properties affecting protein-protein interactions present in solution was in agreement with experiment and theory. Furthermore, the model was able to give a more detailed picture of the protein properties influencing the diffusion coefficient and the acting protein-protein interactions. Biotechnol. Bioeng. 2017;114: 821-831. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Katharina Christin Bauer
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Engler-Bunte-Ring 3, 76131 Karlsruhe, Germany
| | - Frank Hämmerling
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Engler-Bunte-Ring 3, 76131 Karlsruhe, Germany
| | - Jörg Kittelmann
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Engler-Bunte-Ring 3, 76131 Karlsruhe, Germany
| | - Cathrin Dürr
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Engler-Bunte-Ring 3, 76131 Karlsruhe, Germany
| | - Fabian Görlich
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Engler-Bunte-Ring 3, 76131 Karlsruhe, Germany
| | - Jürgen Hubbuch
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Engler-Bunte-Ring 3, 76131 Karlsruhe, Germany
| |
Collapse
|
4
|
Park SH, Haddad PR, Talebi M, Tyteca E, Amos RI, Szucs R, Dolan JW, Pohl CA. Retention prediction of low molecular weight anions in ion chromatography based on quantitative structure-retention relationships applied to the linear solvent strength model. J Chromatogr A 2017; 1486:68-75. [DOI: 10.1016/j.chroma.2016.12.048] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 12/14/2016] [Accepted: 12/16/2016] [Indexed: 10/20/2022]
|
5
|
Hämmerling F, Ladd Effio C, Andris S, Kittelmann J, Hubbuch J. Investigation and prediction of protein precipitation by polyethylene glycol using quantitative structure-activity relationship models. J Biotechnol 2016; 241:87-97. [PMID: 27876584 DOI: 10.1016/j.jbiotec.2016.11.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Revised: 11/14/2016] [Accepted: 11/16/2016] [Indexed: 10/20/2022]
Abstract
Precipitation of proteins is considered to be an effective purification method for proteins and has proven its potential to replace costly chromatography processes. Besides salts and polyelectrolytes, polymers, such as polyethylene glycol (PEG), are commonly used for precipitation applications under mild conditions. Process development, however, for protein precipitation steps still is based mainly on heuristic approaches and high-throughput experimentation due to a lack of understanding of the underlying mechanisms. In this work we apply quantitative structure-activity relationships (QSARs) to model two parameters, the discontinuity point m* and the β-value, that describe the complete precipitation curve of a protein under defined conditions. The generated QSAR models are sensitive to the protein type, pH, and ionic strength. It was found that the discontinuity point m* is mainly dependent on protein molecular structure properties and electrostatic surface properties, whereas the β-value is influenced by the variance in electrostatics and hydrophobicity on the protein surface. The models for m* and the β-value exhibit a good correlation between observed and predicted data with a coefficient of determination of R2≥0.90 and, hence, are able to accurately predict precipitation curves for proteins. The predictive capabilities were demonstrated for a set of combinations of protein type, pH, and ionic strength not included in the generation of the models and good agreement between predicted and experimental data was achieved.
Collapse
Affiliation(s)
- Frank Hämmerling
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany.
| | - Christopher Ladd Effio
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Sebastian Andris
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Jörg Kittelmann
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Jürgen Hubbuch
- Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany.
| |
Collapse
|
6
|
|
7
|
Sheth RD, Morrison CJ, Cramer SM. Selective displacement chromatography in multimodal cation exchange systems. J Chromatogr A 2011; 1218:9250-9. [DOI: 10.1016/j.chroma.2011.10.088] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2011] [Revised: 10/25/2011] [Accepted: 10/30/2011] [Indexed: 11/27/2022]
|
8
|
Morrison CJ, Gagnon P, Cramer SM. Purification of monomeric mAb from associated aggregates using selective desorption chromatography in hydroxyapatite systems. Biotechnol Bioeng 2010; 108:813-21. [DOI: 10.1002/bit.22971] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2010] [Revised: 09/14/2010] [Accepted: 09/28/2010] [Indexed: 11/10/2022]
|
9
|
Morrison CJ, Gagnon P, Cramer SM. Unique selectivity windows using selective displacers/eluents and mobile phase modifiers on hydroxyapatite. J Chromatogr A 2010; 1217:6484-95. [DOI: 10.1016/j.chroma.2010.08.038] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2010] [Revised: 08/09/2010] [Accepted: 08/12/2010] [Indexed: 11/30/2022]
|
10
|
Chung WK, Hou Y, Freed A, Holstein M, Makhatadze GI, Cramer SM. Investigation of protein binding affinity and preferred orientations in ion exchange systems using a homologous protein library. Biotechnol Bioeng 2009; 102:869-81. [PMID: 18821632 DOI: 10.1002/bit.22100] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A library of cold shock protein B (CspB) mutant variants was employed to study protein binding affinity and preferred orientations in cation exchange chromatography. Single site mutations introduced at charged amino acids on the protein surface resulted in a homologous protein set with varying charge density and distribution. The retention times of the mutants varied significantly during linear gradient chromatography. While the expected trends were observed with increasing or decreasing positive charge on the protein surface, the degree of change was a strong function of the location and microenvironment of the mutated amino acid. Quantitative structure-property relationship (QSPR) models were generated using a support vector regression technique that was able to give good predictions of the retention times of the various mutants. Molecular descriptors selected during model generation were used to elucidate the factors affecting protein retention. Electrostatic potential maps were also employed to provide insight into the effects of protein surface topography, charge density and charge distribution on protein binding affinity and possible preferred binding orientations. The use of this protein mutant library in concert with the qualitative and quantitative analyses presented in the article provides an improved understanding of protein behavior in ion exchange systems.
Collapse
Affiliation(s)
- Wai Keen Chung
- Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute, Troy, New York 12180, USA
| | | | | | | | | | | |
Collapse
|
11
|
Xu L, Glatz CE. Predicting protein retention time in ion-exchange chromatography based on three-dimensional protein characterization. J Chromatogr A 2009; 1216:274-80. [DOI: 10.1016/j.chroma.2008.11.075] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2008] [Revised: 11/21/2008] [Accepted: 11/26/2008] [Indexed: 11/26/2022]
|
12
|
Yang T, Breneman CM, Cramer SM. Investigation of multi-modal high-salt binding ion-exchange chromatography using quantitative structure–property relationship modeling. J Chromatogr A 2007; 1175:96-105. [DOI: 10.1016/j.chroma.2007.10.037] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2007] [Revised: 10/08/2007] [Accepted: 10/12/2007] [Indexed: 11/25/2022]
|
13
|
Yang T, Sundling MC, Freed AS, Breneman CM, Cramer SM. Prediction of pH-Dependent Chromatographic Behavior in Ion-Exchange Systems. Anal Chem 2007; 79:8927-39. [DOI: 10.1021/ac071101j] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Ting Yang
- Department of Chemical and Biological Engineering and Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, New York 12180
| | - Matthew C. Sundling
- Department of Chemical and Biological Engineering and Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, New York 12180
| | - Alexander S. Freed
- Department of Chemical and Biological Engineering and Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, New York 12180
| | - Curtis M. Breneman
- Department of Chemical and Biological Engineering and Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, New York 12180
| | - Steven M. Cramer
- Department of Chemical and Biological Engineering and Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, New York 12180
| |
Collapse
|
14
|
Héberger K. Quantitative structure-(chromatographic) retention relationships. J Chromatogr A 2007; 1158:273-305. [PMID: 17499256 DOI: 10.1016/j.chroma.2007.03.108] [Citation(s) in RCA: 268] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2007] [Revised: 03/13/2007] [Accepted: 03/19/2007] [Indexed: 01/30/2023]
Abstract
Since the pioneering works of Kaliszan (R. Kaliszan, Quantitative Structure-Chromatographic Retention Relationships, Wiley, New York, 1987; and R. Kaliszan, Structure and Retention in Chromatography. A Chemometric Approach, Harwood Academic, Amsterdam, 1997) no comprehensive summary is available in the field. Present review covers the period of 1996-August 2006. The sources are grouped according to the special properties of kinds of chromatography: Quantitative structure-retention relationship in gas chromatography, in planar chromatography, in column liquid chromatography, in micellar liquid chromatography, affinity chromatography and quantitative structure enantioselective retention relationships. General tendencies, misleading practice and conclusions, validation of the models, suggestions for future works are summarized for each sub-field. Some straightforward applications are emphasized but standard ones. The sources and the model compounds, descriptors, predicted retention data, modeling methods and indicators of their performance, validation of models, and stationary phases are collected in the tables. Some important conclusions are: Not all physicochemical descriptors correlate with the retention data strongly; the heat of formation is not related to the chromatographic retention. It is not appropriate to give the errors of Kovats indices in percentages. The apparently low values (1-3%) can disorient the reviewers and readers. Contemporary mean interlaboratory reproducibility of Kovats indices are about 5-10 i.u. for standard non polar phases and 10-25 i.u. for standard polar phases. The predictive performance of QSRR models deteriorates as the polarity of GC stationary phase increases. The correlation coefficient alone is not a particularly good indicator for the model performance. Residuals are more useful than plots of measured and calculated values. There is no need to give the retention data in a form of an equation if the numbers of compounds are small. The domain of model applicability of models should be given in all cases.
Collapse
Affiliation(s)
- Károly Héberger
- Chemical Research Center, Hungarian Academy of Sciences, P.O. Box 17, H-1525 Budapest, Hungary.
| |
Collapse
|
15
|
Montanari MLC, Gaudio AC, Leitão A, de Almeida TMG, Montanari CA. Chemometric Characterization of Chromatographic Retention Parameters of Mesoionic 1,3,4‐Thiadiazolium‐3‐Aminides by Molecular Interaction Fields. J LIQ CHROMATOGR R T 2007. [DOI: 10.1080/10826070500451830] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
| | - Anderson C. Gaudio
- b Departamento de Física , Universidade Federal do Espírito Santo , Vitória, ES, Brazil
| | - Andrei Leitão
- c Núcleo de Estudos em Química Medicinal‐NEQUIM , Universidade Federal de Minas Gerais , Belo Horizonte, MG, Brazil
| | - Tânia M. G. de Almeida
- c Núcleo de Estudos em Química Medicinal‐NEQUIM , Universidade Federal de Minas Gerais , Belo Horizonte, MG, Brazil
| | - Carlos A. Montanari
- d Departamento de Química e Física Molecular , Instituto de Química de São Carlos, Universidade de São Paulo , São Carlos, SP, Brazil
| |
Collapse
|
16
|
Lohrmann M, Schulte M, Strube J. Generic method for systematic phase selection and method development of biochromatographic processes. J Chromatogr A 2005; 1092:89-100. [PMID: 16188563 DOI: 10.1016/j.chroma.2005.05.067] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2004] [Revised: 04/27/2005] [Accepted: 05/24/2005] [Indexed: 10/25/2022]
Abstract
Even if the first protein therapeutics are now for more than 20 years on the market the selection of suitable adsorbents for the preparative downstream processing (DSP) of these biomolecules as well as the method development towards process conditions are still based mainly on 'trial and error'. Therefore, theses processes are not perfectly efficient, but indeed very time consuming and laborious. In this study a novel systematic method is introduced to find a suitable adsorbent (not necessarily the best one) with appropriate separation parameters for a specific separation with reduced effort. Following this strategy, the adsorbents must first be packed into columns under preparative conditions and then characterized completely with regard to, e.g. pressure drop, k'-values, plate heights (HETP curves), selectivity and capacity by using test substances, which are similar in their characteristics (molecular mass, size, charge distribution, hydrophobicity) to the target proteins. With the database once determined, a preselection of most suitable adsorbents including separation parameters is made regarding chromatographic and also economical properties. After this, preparative experiments must be conducted with a reduced number of adsorbents to figure out the individual influence of side components. This approach is demonstrated for the separation of an exemplary industrial protein mixture using cation-exchange chromatography (CEX). Characterization of different weak CEX-adsorbents is illustrated. After comparing these phases with each other, a first preselection and a prediction of suitable adsorbents is made. In the following preparative separation conditions (load, velocity, gradient) are determined for the preparative separations using the database and results of some additional experiments. The final comparison of separation performance in preparative scale confirms this selection and so the applicability of the new method.
Collapse
Affiliation(s)
- Martin Lohrmann
- Department of Biochemical and Chemical Engineering, University of Dortmund, 44221 Dortmund, Germany.
| | | | | |
Collapse
|