1
|
Aghayev Z, Szafran AT, Tran A, Ganesh HS, Stossi F, Zhou L, Mancini MA, Pistikopoulos EN, Beykal B. Machine Learning Methods for Endocrine Disrupting Potential Identification Based on Single-Cell Data. Chem Eng Sci 2023; 281:119086. [PMID: 37637227 PMCID: PMC10448728 DOI: 10.1016/j.ces.2023.119086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2023]
Abstract
Humans are continuously exposed to a variety of toxicants and chemicals which is exacerbated during and after environmental catastrophes such as floods, earthquakes, and hurricanes. The hazardous chemical mixtures generated during these events threaten the health and safety of humans and other living organisms. This necessitates the development of rapid decision-making tools to facilitate mitigating the adverse effects of exposure on the key modulators of the endocrine system, such as the estrogen receptor alpha (ERα), for example. The mechanistic stages of the estrogenic transcriptional activity can be measured with high content/high throughput microscopy-based biosensor assays at the single-cell level, which generates millions of object-based minable data points. By combining computational modeling and experimental analysis, we built a highly accurate data-driven classification framework to assess the endocrine disrupting potential of environmental compounds. The effects of these compounds on the ERα pathway are predicted as being receptor agonists or antagonists using the principal component analysis (PCA) projections of high throughput, high content image analysis descriptors. The framework also combines rigorous preprocessing steps and nonlinear machine learning algorithms, such as the Support Vector Machines and Random Forest classifiers, to develop highly accurate mathematical representations of the separation between ERα agonists and antagonists. The results show that Support Vector Machines classify the unseen chemicals correctly with more than 96% accuracy using the proposed framework, where the preprocessing and the PCA steps play a key role in suppressing experimental noise and unraveling hidden patterns in the dataset.
Collapse
Affiliation(s)
- Zahir Aghayev
- Department of Chemical and Biomolecular Engineering, University of Connecticut, Storrs, CT
- Center for Clean Energy Engineering, University of Connecticut, Storrs, CT
| | - Adam T. Szafran
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX
| | - Anh Tran
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX
- Texas A&M Energy Institute, Texas A&M University, College Station, TX
| | - Hari S. Ganesh
- Discipline of Chemical Engineering, Indian Institute of Technology Gandhinagar, Palaj, Gandhinagar, Gujarat - 382055, India
| | - Fabio Stossi
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX
- GCC Center for Advanced Microscopy and Image Informatics, Houston, TX
| | - Lan Zhou
- Department of Statistics, Texas A&M University, College Station, TX
| | - Michael A. Mancini
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX
- GCC Center for Advanced Microscopy and Image Informatics, Houston, TX
| | - Efstratios N. Pistikopoulos
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX
- Texas A&M Energy Institute, Texas A&M University, College Station, TX
| | - Burcu Beykal
- Department of Chemical and Biomolecular Engineering, University of Connecticut, Storrs, CT
- Center for Clean Energy Engineering, University of Connecticut, Storrs, CT
| |
Collapse
|
2
|
Koksal ES, Aydin E. Physics Informed Piecewise Linear Neural Networks for Process Optimization. Comput Chem Eng 2023. [DOI: 10.1016/j.compchemeng.2023.108244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
|
3
|
Galeazzi A, Prifti K, Cortellini C, Di Pretoro A, Gallo F, Manenti F. Development of a surrogate model of an amine scrubbing digital twin using machine learning methods. Comput Chem Eng 2023. [DOI: 10.1016/j.compchemeng.2023.108252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2023]
|
4
|
Kappatou CD, Odgers J, García-Muñoz S, Misener R. An Optimization Approach Coupling Preprocessing with Model Regression for Enhanced Chemometrics. Ind Eng Chem Res 2023; 62:6196-6213. [PMID: 37097815 PMCID: PMC10119938 DOI: 10.1021/acs.iecr.2c04583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/02/2023] [Accepted: 03/27/2023] [Indexed: 04/09/2023]
Abstract
Chemometric methods are broadly used in the chemical and biochemical sectors. Typically, derivation of a regression model follows data preprocessing in a sequential manner. Yet, preprocessing can significantly influence the regression model and eventually its predictive ability. In this work, we investigate the coupling of preprocessing and model parameter estimation by incorporating them simultaneously in an optimization step. Common model selection techniques rely almost exclusively on the performance of some accuracy metric, yet having a quantitative metric for model robustness can prolong model up-time. Our approach is applied to optimize for model accuracy and robustness. This requires the introduction of a novel mathematical definition for robustness. We test our method in a simulated set up and with industrial case studies from multivariate calibration. The results highlight the importance of both accuracy and robustness properties and illustrate the potential of the proposed optimization approach toward automating the generation of efficient chemometric models.
Collapse
Affiliation(s)
- Chrysoula D. Kappatou
- Computational Optimisation Group, Department of Computing, Imperial College London, London SW7 2AZ, United Kingdom
| | - James Odgers
- Computational Optimisation Group, Department of Computing, Imperial College London, London SW7 2AZ, United Kingdom
| | - Salvador García-Muñoz
- Synthetic Molecule Design and Development, Lilly Research Laboratories, Eli Lilly & Company, Indianapolis, Indiana 46285, United States
| | - Ruth Misener
- Computational Optimisation Group, Department of Computing, Imperial College London, London SW7 2AZ, United Kingdom
| |
Collapse
|
5
|
Folch JP, Lee RM, Shafei B, Walz D, Tsay C, van der Wilk M, Misener R. Combining multi-fidelity modelling and asynchronous batch Bayesian Optimization. Comput Chem Eng 2023. [DOI: 10.1016/j.compchemeng.2023.108194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
|
6
|
Wang J, Tian K, Li D, Chen M, Feng X, Zhang Y, Wang Y, Van der Bruggen B. Machine learning in gas separation membrane developing: ready for prime time. Sep Purif Technol 2023. [DOI: 10.1016/j.seppur.2023.123493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
|
7
|
Physics-Informed Recurrent Neural Networks and Hyper-parameter Optimization for Dynamic Process Systems. Comput Chem Eng 2023. [DOI: 10.1016/j.compchemeng.2023.108195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
|