1
|
Mazraedoost S, Žuvela P, Ulenberg S, Bączek T, Liu JJ. Cross-column density functional theory-based quantitative structure-retention relationship model development powered by machine learning. Anal Bioanal Chem 2024:10.1007/s00216-024-05243-7. [PMID: 38507043 DOI: 10.1007/s00216-024-05243-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 03/03/2024] [Accepted: 03/06/2024] [Indexed: 03/22/2024]
Abstract
Quantitative structure-retention relationship (QSRR) modeling has emerged as an efficient alternative to predict analyte retention times using molecular descriptors. However, most reported QSRR models are column-specific, requiring separate models for each high-performance liquid chromatography (HPLC) system. This study evaluates the potential of machine learning (ML) algorithms and quantum mechanical (QM) descriptors to develop QSRR models that can predict retention times across three different reversed-phase HPLC columns under varying conditions. Four machine learning methods-partial least squares (PLS) regression, ridge regression (RR), random forest (RF), and gradient boosting (GB)-were compared on a dataset of 360 retention times for 15 aromatic analytes. Molecular descriptors were calculated using density functional theory (DFT). Column characteristics like particle size and pore size and experimental conditions like temperature and gradient time were additionally used as descriptors. Results showed that the GB-QSRR model demonstrated the best predictive performance, with Q2 of 0.989 and root mean square error of prediction (RMSEP) of 0.749 min on the test set. Feature analysis revealed that solvation energy (SE), HOMO-LUMO energy gap (∆E HOMO-LUMO), total dipole moment (Mtot), and global hardness (η) are among the most influential predictors for retention time prediction, indicating the significance of electrostatic interactions and hydrophobicity. Our findings underscore the efficiency of ensemble methods, GB and RF models employing non-linear learners, in capturing local variations in retention times across diverse experimental setups. This study emphasizes the potential of cross-column QSRR modeling and highlights the utility of ML models in optimizing chromatographic analysis.
Collapse
Affiliation(s)
- Sargol Mazraedoost
- Intelligent Systems Laboratory, Department of Chemical Engineering, Pukyong National University, Busan, 48513, Republic of Korea
| | - Petar Žuvela
- Intelligent Systems Laboratory, Department of Chemical Engineering, Pukyong National University, Busan, 48513, Republic of Korea
| | - Szymon Ulenberg
- Department of Pharmaceutical Chemistry, Medical University of Gdańsk, Gen. J. Hallera 107, 80-416, Gdańsk, Poland
| | - Tomasz Bączek
- Department of Pharmaceutical Chemistry, Medical University of Gdańsk, Gen. J. Hallera 107, 80-416, Gdańsk, Poland
| | - J Jay Liu
- Intelligent Systems Laboratory, Department of Chemical Engineering, Pukyong National University, Busan, 48513, Republic of Korea.
- Institute of Cleaner Production Technology, Pukyong National University, (48513) 45, Yongso-Ro, Nam-Gu, Busan, South Korea.
| |
Collapse
|
2
|
Adams J, Agyenkwa-Mawuli K, Agyapong O, Wilson MD, Kwofie SK. EBOLApred: A machine learning-based web application for predicting cell entry inhibitors of the Ebola virus. Comput Biol Chem 2022; 101:107766. [DOI: 10.1016/j.compbiolchem.2022.107766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 08/10/2022] [Accepted: 08/29/2022] [Indexed: 11/03/2022]
|
3
|
Feasibility and application of machine learning enabled fast screening of poly-beta-amino-esters for cartilage therapies. Sci Rep 2022; 12:14215. [PMID: 35987777 PMCID: PMC9392801 DOI: 10.1038/s41598-022-18332-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Accepted: 08/09/2022] [Indexed: 11/16/2022] Open
Abstract
Despite the large prevalence of diseases affecting cartilage (e.g. knee osteoarthritis affecting 16% of population globally), no curative treatments are available because of the limited capacity of drugs to localise in such tissue caused by low vascularisation and electrostatic repulsion. While an effective delivery system is sought, the only option is using high drug doses that can lead to systemic side effects. We introduced poly-beta-amino-esters (PBAEs) to effectively deliver drugs into cartilage tissues. PBAEs are copolymer of amines and di-acrylates further end-capped with other amine; therefore encompassing a very large research space for the identification of optimal candidates. In order to accelerate the screening of all possible PBAEs, the results of a small pool of polymers (n = 90) were used to train a variety of machine learning (ML) methods using only polymers properties available in public libraries or estimated from the chemical structure. Bagged multivariate adaptive regression splines (MARS) returned the best predictive performance and was used on the remaining (n = 3915) possible PBAEs resulting in the recognition of pivotal features; a further round of screening was carried out on PBAEs (n = 150) with small variations of structure of the main candidates from the first round. The refinements of such characteristics enabled the identification of a leading candidate predicted to improve drug uptake > 20 folds over conventional clinical treatment; this uptake improvement was also experimentally confirmed. This work highlights the potential of ML to accelerate biomaterials development by efficiently extracting information from a limited experimental dataset thus allowing patients to benefit earlier from a new technology and at a lower price. Such roadmap could also be applied for other drug/materials development where optimisation would normally be approached through combinatorial chemistry.
Collapse
|
4
|
Suresh N, Chinnakonda Ashok Kumar N, Subramanian S, Srinivasa G. Memory augmented recurrent neural networks for de-novo drug design. PLoS One 2022; 17:e0269461. [PMID: 35737661 PMCID: PMC9223405 DOI: 10.1371/journal.pone.0269461] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 05/22/2022] [Indexed: 12/01/2022] Open
Abstract
A recurrent neural network (RNN) is a machine learning model that learns the relationship between elements of an input series, in addition to inferring a relationship between the data input to the model and target output. Memory augmentation allows the RNN to learn the interrelationships between elements of the input over a protracted length of the input series. Inspired by the success of stack augmented RNN (StackRNN) to generate strings for various applications, we present two memory augmented RNN-based architectures: the Neural Turing Machine (NTM) and the Differentiable Neural Computer (DNC) for the de-novo generation of small molecules. We trained a character-level convolutional neural network (CNN) to predict the properties of a generated string and compute a reward or loss in a deep reinforcement learning setup to bias the Generator to produce molecules with the desired property. Further, we compare the performance of these architectures to gain insight to their relative merits in terms of the validity and novelty of the generated molecules and the degree of property bias towards the computational generation of de-novo drugs. We also compare the performance of these architectures with simpler recurrent neural networks (Vanilla RNN, LSTM, and GRU) without an external memory component to explore the impact of augmented memory in the task of de-novo generation of small molecules.
Collapse
Affiliation(s)
- Naveen Suresh
- PES Center for Pattern Recognition and Department of Computer Science and Engineering, PES University, Bengaluru, Karnataka, India
| | - Neelesh Chinnakonda Ashok Kumar
- PES Center for Pattern Recognition and Department of Computer Science and Engineering, PES University, Bengaluru, Karnataka, India
| | - Srikumar Subramanian
- PES Center for Pattern Recognition and Department of Computer Science and Engineering, PES University, Bengaluru, Karnataka, India
| | - Gowri Srinivasa
- PES Center for Pattern Recognition and Department of Computer Science and Engineering, PES University, Bengaluru, Karnataka, India
- * E-mail:
| |
Collapse
|
5
|
Saranyadevi S. Multifaceted targeting strategies in cancer against the human notch 3 protein: a computational study. In Silico Pharmacol 2021; 9:53. [PMID: 34631360 DOI: 10.1007/s40203-021-00112-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 09/20/2021] [Indexed: 11/30/2022] Open
Abstract
Notch receptors play a significant role in the development and the regulation of cell-fate in several multicellular organisms. For normal differentiation, genomes are essential as their regular roles and play a role in cancer is dysregulated. Notch 3 has been shown to play a major role in lung cancer function and therefore, inhibition of notch 3 protein activation represents a clear plan for cancer treatment. This study accomplished a combined structure- and ligand-based pharmacophore hypothesis to explore novel notch 3 inhibitors. The analysis identified common lead molecule ZINC000013449462 that showed better XP GScore and binding energy score than the reference inhibitor DAPT. The identified lead compound that passed all the druggable characteristics exhibited stable binding. Furthermore, the lead molecule can also form hydrogen and salt bridge interactions with binding site residues Asp1621 and Arg1465 residues, respectively of the active pockets of notch 3 protein. In essence, the inhibitory activity of the hit was validated across 109 NSCLC cell lines by employing a deep neural network algorithm. Our study proposes that ZINC000013449462 would be a possible prototype molecule towards the notch 3 target and further examined by clinical studies to combat NSCLC.
Collapse
Affiliation(s)
- S Saranyadevi
- Department of Nanotechnology, Nanodot Research Private Limited, Nagercoil, Kanyakumari, 629001 India
| |
Collapse
|
6
|
Agyapong O, Miller WA, Wilson MD, Kwofie SK. Development of a proteochemometric-based support vector machine model for predicting bioactive molecules of tubulin receptors. Mol Divers 2021; 26:2231-2242. [PMID: 34626303 DOI: 10.1007/s11030-021-10329-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 09/23/2021] [Indexed: 11/26/2022]
Abstract
Microtubules are receiving enormous interest in drug discovery due to the important roles they play in cellular functions. Targeting tubulin polymerization presents an excellent opportunity for the development of anti-tubulin drugs. Drug resistance and high toxicity of currently used tubulin-binding agents have necessitated the pursuit of novel drug candidates with increased therapeutic potency. The design of novel drug candidates can be achieved using efficient computational techniques to support existing efforts. Proteochemometric (PCM) modeling is a computational technique that can be employed to elucidate the bioactivity relations between related targets and multiple ligands. We have developed a PCM-based Support Vector Machine (SVM) approach for predicting the bioactivity between tubulin receptors and small, drug-like molecules. The bioactivity datasets used for training the SVM algorithm were obtained from the Binding DB database. The SVM-based PCM model yielded a good overall predictive performance with an area under the curve (AUC) of 87%, Matthews correlation coefficient (MCC) of 72%, overall accuracy of 93%, and a classification error of 7%. The algorithm allows the prediction of the likelihood of new interactions based on confidence scores between the query datasets, comprising ligands in SMILES format and protein sequences of tubulin targets. The algorithm has been implemented as a web server known as TubPred, accessible via http://35.167.90.225:5000/ .
Collapse
Affiliation(s)
- Odame Agyapong
- Department of Biomedical Engineering, School of Engineering Sciences, College of Basic and Applied Sciences, University of Ghana, PMB LG 77, Legon, Accra, Ghana
- Department of Parasitology, Noguchi Memorial Institute for Medical Research (NMIMR), College of Health Sciences (CHS), University of Ghana, P.O. Box LG 581, Legon, Accra, Ghana
| | - Whelton A Miller
- Department of Medicine, Loyola University Medical Center, Maywood, IL, 60153, USA
- School of Engineering and Applied Science, Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Molecular Pharmacology and Neuroscience, Loyola University Medical Center, Maywood, IL, 60153, USA
| | - Michael D Wilson
- Department of Parasitology, Noguchi Memorial Institute for Medical Research (NMIMR), College of Health Sciences (CHS), University of Ghana, P.O. Box LG 581, Legon, Accra, Ghana
- Department of Medicine, Loyola University Medical Center, Maywood, IL, 60153, USA
| | - Samuel K Kwofie
- Department of Biomedical Engineering, School of Engineering Sciences, College of Basic and Applied Sciences, University of Ghana, PMB LG 77, Legon, Accra, Ghana.
- West African Centre for Cell Biology of Infectious Pathogens, Department of Biochemistry, Cell and Molecular Biology, College of Basic and Applied Sciences, University of Ghana, Accra, Ghana.
| |
Collapse
|