1
|
Furxhi I, Faccani L, Zanoni I, Brigliadori A, Vespignani M, Costa AL. Design rules applied to silver nanoparticles synthesis: A practical example of machine learning application. Comput Struct Biotechnol J 2024; 25:20-33. [PMID: 38444982 PMCID: PMC10914561 DOI: 10.1016/j.csbj.2024.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 02/12/2024] [Accepted: 02/14/2024] [Indexed: 03/07/2024] Open
Abstract
The synthesis of silver nanoparticles with controlled physicochemical properties is essential for governing their intended functionalities and safety profiles. However, synthesis process involves multiple parameters that could influence the resulting properties. This challenge could be addressed with the development of predictive models that forecast endpoints based on key synthesis parameters. In this study, we manually extracted synthesis-related data from the literature and leveraged various machine learning algorithms. Data extraction included parameters such as reactant concentrations, experimental conditions, as well as physicochemical properties. The antibacterial efficiencies and toxicological profiles of the synthesized nanoparticles were also extracted. In a second step, based on data completeness, we employed regression algorithms to establish relationships between synthesis parameters and desired endpoints and to build predictive models. The models for core size and antibacterial efficiency were trained and validated using a cross-validation approach. Finally, the features' impact was evaluated via Shapley values to provide insights into the contribution of features to the predictions. Factors such as synthesis duration, scale of synthesis and the choice of capping agents emerged as the most significant predictors. This study demonstrated the potential of machine learning to aid in the rational design of synthesis process and paves the way for the safe-by-design principles development by providing insights into the optimization of the synthesis process to achieve the desired properties. Finally, this study provides a valuable dataset compiled from literature sources with significant time and effort from multiple researchers. Access to such datasets notably aids computational advances in the field of nanotechnology.
Collapse
Affiliation(s)
- Irini Furxhi
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
- Transgero Limited, Limerick, Ireland
| | - Lara Faccani
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
| | - Ilaria Zanoni
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
| | - Andrea Brigliadori
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
| | - Maurizio Vespignani
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
| | - Anna Luisa Costa
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
| |
Collapse
|
2
|
Cardoso Rial R. AI in analytical chemistry: Advancements, challenges, and future directions. Talanta 2024; 274:125949. [PMID: 38569367 DOI: 10.1016/j.talanta.2024.125949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 03/09/2024] [Accepted: 03/17/2024] [Indexed: 04/05/2024]
Abstract
This article explores the influence and applications of Artificial Intelligence (AI) in analytical chemistry, highlighting its potential to revolutionize the analysis of complex data sets and the development of innovative analytical methods. Additionally, it discusses the role of AI in interpreting large-scale data and optimizing experimental processes. AI has been fundamental in managing heterogeneous data and in advanced analysis of complex spectra in areas such as spectroscopy and chromatography. The article also examines the historical development of AI in chemistry, its current challenges, including the interpretation of AI models and the integration of large volumes of data. Finally, it forecasts future trends and the potential impact of AI on analytical chemistry, emphasizing the need for ethical and secure approaches in the use of AI.
Collapse
Affiliation(s)
- Rafael Cardoso Rial
- Federal Institute of Mato Grosso do Sul, 79750-000, Nova Andradina, MS, Brazil.
| |
Collapse
|
3
|
Loryuenyong V, Rohing S, Singhanam P, Kamkang H, Buasri A. Artificial Neural Network and Response Surface Methodology for Predicting and Maximizing Biodiesel Production from Waste Oil with KI/CaO/Al 2O 3 Catalyst in a Fixed Bed Reactor. Chempluschem 2024:e202400117. [PMID: 38771717 DOI: 10.1002/cplu.202400117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 05/10/2024] [Accepted: 05/21/2024] [Indexed: 05/23/2024]
Abstract
Biodiesel from waste oil is produced using heterogeneous catalyzed transesterification in a fixed bed reactor (FBR). Potassium iodide/calcium oxide/alumina (KI/CaO/Al2O3) catalyst was prepared through the processes of calcination and impregnation. The novel catalyst was analyzed with X-ray diffraction (XRD), scanning electron microscopy (SEM), and energy dispersive X-ray spectrometer (EDX). The design of experiment (DoE) method resulted in a total of 20 experimental runs. The significance of 3 reaction parameters, namely catalyst bed height, methanol to waste oil molar ratio, and residence time, and their combined impact on biodiesel yield is investigated. Both the artificial neural network (ANN) based on artificial intelligence (AI) and the Box-Behnken design (BBD) based on response surface methodology (RSM) were utilized in order to optimize the process conditions and maximize the biodiesel production. A quadratic regression model was developed to predict biodiesel yield, with a correlation coefficient (R) value of 0.9994 for ANN model and a coefficient of determination (R2) value of 0.9986 for BBD model. The maximum amount of biodiesel that can be produced is 98.88 % when catalyst bed height is 7.87 cm, molar ratio of methanol to waste oil is 17.47 : 1, and residence time is 3.12 h. The results of this study indicate that ANN and BBD models can effectively be used to optimize and synthesize the highest %yield of biodiesel in a FBR.
Collapse
Affiliation(s)
- Vorrada Loryuenyong
- Department of Materials Science and Engineering, Silpakorn University, Faculty of Engineering and Industrial Technology, 73000, Nakhon Pathom, Thailand
| | - Sitifatimah Rohing
- Department of Materials Science and Engineering, Silpakorn University, Faculty of Engineering and Industrial Technology, 73000, Nakhon Pathom, Thailand
| | - Papatsara Singhanam
- Department of Materials Science and Engineering, Silpakorn University, Faculty of Engineering and Industrial Technology, 73000, Nakhon Pathom, Thailand
| | - Hatsatorn Kamkang
- Department of Materials Science and Engineering, Silpakorn University, Faculty of Engineering and Industrial Technology, 73000, Nakhon Pathom, Thailand
| | - Achanai Buasri
- Department of Materials Science and Engineering, Silpakorn University, Faculty of Engineering and Industrial Technology, 73000, Nakhon Pathom, Thailand
| |
Collapse
|
4
|
Di Stefano M, Galati S, Piazza L, Granchi C, Mancini S, Fratini F, Macchia M, Poli G, Tuccinardi T. VenomPred 2.0: A Novel In Silico Platform for an Extended and Human Interpretable Toxicological Profiling of Small Molecules. J Chem Inf Model 2024; 64:2275-2289. [PMID: 37676238 PMCID: PMC11005041 DOI: 10.1021/acs.jcim.3c00692] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Indexed: 09/08/2023]
Abstract
The application of artificial intelligence and machine learning (ML) methods is becoming increasingly popular in computational toxicology and drug design; it is considered as a promising solution for assessing the safety profile of compounds, particularly in lead optimization and ADMET studies, and to meet the principles of the 3Rs, which calls for the replacement, reduction, and refinement of animal testing. In this context, we herein present the development of VenomPred 2.0 (http://www.mmvsl.it/wp/venompred2/), the new and improved version of our free of charge web tool for toxicological predictions, which now represents a powerful web-based platform for multifaceted and human-interpretable in silico toxicity profiling of chemicals. VenomPred 2.0 presents an extended set of toxicity endpoints (androgenicity, skin irritation, eye irritation, and acute oral toxicity, in addition to the already available carcinogenicity, mutagenicity, hepatotoxicity, and estrogenicity) that can be evaluated through an exhaustive consensus prediction strategy based on multiple ML models. Moreover, we also implemented a new utility based on the Shapley Additive exPlanations (SHAP) method that allows human interpretable toxicological profiling of small molecules, highlighting the features that strongly contribute to the toxicological predictions in order to derive structural toxicophores.
Collapse
Affiliation(s)
- Miriana Di Stefano
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
- Department
of Life Sciences, University of Siena, 53100 Siena, Italy
| | - Salvatore Galati
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Lisa Piazza
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Carlotta Granchi
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Simone Mancini
- Department
of Veterinary Sciences, University of Pisa, Viale Delle Piagge 2, 56124 Pisa, Italy
| | - Filippo Fratini
- Department
of Veterinary Sciences, University of Pisa, Viale Delle Piagge 2, 56124 Pisa, Italy
| | - Marco Macchia
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Giulio Poli
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Tiziano Tuccinardi
- Department
of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy
| |
Collapse
|
5
|
Wu Z, Chen J, Li Y, Deng Y, Zhao H, Hsieh CY, Hou T. From Black Boxes to Actionable Insights: A Perspective on Explainable Artificial Intelligence for Scientific Discovery. J Chem Inf Model 2023; 63:7617-7627. [PMID: 38079566 DOI: 10.1021/acs.jcim.3c01642] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
The application of Explainable Artificial Intelligence (XAI) in the field of chemistry has garnered growing interest for its potential to justify the prediction of black-box machine learning models and provide actionable insights. We first survey a range of XAI techniques adapted for chemical applications and categorize them based on the technical details of each methodology. We then present a few case studies to illustrate the practical utility of XAI, such as identifying carcinogenic molecules and guiding molecular optimizations, in order to provide chemists with concrete examples of ways to take full advantage of XAI-augmented machine learning for chemistry. Despite the initial success of XAI in chemistry, we still face the challenges of developing more reliable explanations, assuring robustness against adversarial actions, and customizing the explanation for different applications and needs of the diverse scientific community. Finally, we discuss the emerging role of large language models like GPT in generating natural language explanations and discusses the specific challenges associated with them. We advocate that addressing the aforementioned challenges and actively embracing new techniques may contribute to establishing machine learning as an indispensable technique for chemistry in this digital era.
Collapse
Affiliation(s)
- Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
- CarbonSilicon AI Technology Company, Limited, Hangzhou, 310018 Zhejiang, P. R. China
| | - Jihong Chen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
- CarbonSilicon AI Technology Company, Limited, Hangzhou, 310018 Zhejiang, P. R. China
| | - Yitong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Company, Limited, Hangzhou, 310018 Zhejiang, P. R. China
| | - Haitao Zhao
- Center for Intelligent and Biomimetic Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 440305 Guangdong, P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
| |
Collapse
|
6
|
Fonseca G, Poltavsky I, Tkatchenko A. Force Field Analysis Software and Tools (FFAST): Assessing Machine Learning Force Fields under the Microscope. J Chem Theory Comput 2023; 19:8706-8717. [PMID: 38011895 PMCID: PMC10720330 DOI: 10.1021/acs.jctc.3c00985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/29/2023]
Abstract
As the sophistication of machine learning force fields (MLFF) increases to match the complexity of extended molecules and materials, so does the need for tools to properly analyze and assess the practical performance of MLFFs. To go beyond average error metrics and into a complete picture of a model's applicability and limitations, we developed FFAST (force field analysis software and tools): a cross-platform software package designed to gain detailed insights into a model's performance and limitations, complete with an easy-to-use graphical user interface. The software allows the user to gauge the performance of any molecular force field,─such as popular state-of-the-art MLFF models, ─ on various popular data set types, providing general prediction error overviews, outlier detection mechanisms, atom-projected errors, and more. It has a 3D visualizer to find and picture problematic configurations, atoms, or clusters in a large data set. In this paper, the example of the MACE and NequIP models is used on two data sets of interest [stachyose and docosahexaenoic acid (DHA)]─to illustrate the use cases of the software. With this, it was found that carbons and oxygens involved in or near glycosidic bonds inside the stachyose molecule present increased prediction errors. In addition, prediction errors on DHA rise as the molecule folds, especially for the carboxylic group at the edge of the molecule. We emphasize the need for a systematic assessment of MLFF models for ensuring their successful application to the study of dynamics of molecules and materials.
Collapse
Affiliation(s)
- Gregory Fonseca
- Department of Physics and Materials
Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| | - Igor Poltavsky
- Department of Physics and Materials
Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials
Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| |
Collapse
|
7
|
Galati S, Di Stefano M, Bertini S, Granchi C, Giordano A, Gado F, Macchia M, Tuccinardi T, Poli G. Identification of New GSK3β Inhibitors through a Consensus Machine Learning-Based Virtual Screening. Int J Mol Sci 2023; 24:17233. [PMID: 38139062 PMCID: PMC10743990 DOI: 10.3390/ijms242417233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 12/05/2023] [Accepted: 12/06/2023] [Indexed: 12/24/2023] Open
Abstract
Glycogen synthase kinase-3 beta (GSK3β) is a serine/threonine kinase that plays key roles in glycogen metabolism, Wnt/β-catenin signaling cascade, synaptic modulation, and multiple autophagy-related signaling pathways. GSK3β is an attractive target for drug discovery since its aberrant activity is involved in the development of neurodegenerative diseases such as Alzheimer's and Parkinson's disease. In the present study, multiple machine learning models aimed at identifying novel GSK3β inhibitors were developed and evaluated for their predictive reliability. The most powerful models were combined in a consensus approach, which was used to screen about 2 million commercial compounds. Our consensus machine learning-based virtual screening led to the identification of compounds G1 and G4, which showed inhibitory activity against GSK3β in the low-micromolar and sub-micromolar range, respectively. These results demonstrated the reliability of our virtual screening approach. Moreover, docking and molecular dynamics simulation studies were employed for predicting reliable binding modes for G1 and G4, which represent two valuable starting points for future hit-to-lead and lead optimization studies.
Collapse
Affiliation(s)
- Salvatore Galati
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (S.B.); (C.G.); (M.M.); (G.P.)
| | - Miriana Di Stefano
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (S.B.); (C.G.); (M.M.); (G.P.)
- Department of Life Sciences, University of Siena, 53100 Siena, Italy
| | - Simone Bertini
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (S.B.); (C.G.); (M.M.); (G.P.)
| | - Carlotta Granchi
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (S.B.); (C.G.); (M.M.); (G.P.)
| | - Antonio Giordano
- Sbarro Institute for Cancer Research and Molecular Medicine Center for Biotechnology, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA;
- Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
| | - Francesca Gado
- Department of Pharmaceutical Sciences, University of Milan, 20133 Milan, Italy;
| | - Marco Macchia
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (S.B.); (C.G.); (M.M.); (G.P.)
| | - Tiziano Tuccinardi
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (S.B.); (C.G.); (M.M.); (G.P.)
| | - Giulio Poli
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (S.B.); (C.G.); (M.M.); (G.P.)
| |
Collapse
|
8
|
Gil-Pichardo A, Sánchez-Ruiz A, Colmenarejo G. Analysis of metabolites in human gut: illuminating the design of gut-targeted drugs. J Cheminform 2023; 15:96. [PMID: 37833792 PMCID: PMC10571276 DOI: 10.1186/s13321-023-00768-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 10/06/2023] [Indexed: 10/15/2023] Open
Abstract
Gut-targeted drugs provide a new drug modality besides that of oral, systemic molecules, that could tap into the growing knowledge of gut metabolites of bacterial or host origin and their involvement in biological processes and health through their interaction with gut targets (bacterial or host, too). Understanding the properties of gut metabolites can provide guidance for the design of gut-targeted drugs. In the present work we analyze a large set of gut metabolites, both shared with serum or present only in gut, and compare them with oral systemic drugs. We find patterns specific for these two subsets of metabolites that could be used to design drugs targeting the gut. In addition, we develop and openly share a Super Learner model to predict gut permanence, in order to aid in the design of molecules with appropriate profiles to remain in the gut, resulting in molecules with putatively reduced secondary effects and better pharmacokinetics.
Collapse
Affiliation(s)
- Alberto Gil-Pichardo
- Biostatistics and Bioinformatics Unit, IMDEA Food, CEI UAM+CSIC, 28049, Madrid, Spain
| | - Andrés Sánchez-Ruiz
- Biostatistics and Bioinformatics Unit, IMDEA Food, CEI UAM+CSIC, 28049, Madrid, Spain
| | - Gonzalo Colmenarejo
- Biostatistics and Bioinformatics Unit, IMDEA Food, CEI UAM+CSIC, 28049, Madrid, Spain.
| |
Collapse
|
9
|
Kostal J. Making the Case for Quantum Mechanics in Predictive Toxicology─Nearly 100 Years Too Late? Chem Res Toxicol 2023; 36:1444-1450. [PMID: 37676849 DOI: 10.1021/acs.chemrestox.3c00171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
The use of quantum mechanics (QM) has long been the norm to study covalent-binding phenomena in chemistry and biochemistry. The pharmaceutical industry leverages QM models explicitly in covalent drug discovery and implicitly to characterize short-range interactions in noncovalent binding. Predictive toxicology has resisted widespread adoption of QM, including in the pharmaceutical industry, despite its obvious relevance to the metabolic processes in the upstream of adverse outcome pathways and advances in both QM methods and computational resources, which support fit-for-purpose applications in reasonable timeframes. Here, we make the case for embracing QM as an indispensable part of a toxicologist's toolkit. We argue that QM provides the necessary orthogonality to alert-based expert systems and traditional QSARs, consistent with calls for animal-free integrated testing strategies for safety assessments of commercial chemicals. We outline existing roadblocks to this transition, including the need to train model developers in QM and the shift toward service-based toxicity models that utilize high-performance computing clusters. Lastly, we describe recent examples of successful implementations of QM in hazard assessments and propose how in silico toxicology can be further advanced by integrating QM with artificial intelligence.
Collapse
Affiliation(s)
- Jakub Kostal
- Designing Out Toxicity (DOT) Consulting LLC, 2121 Eisenhower Avenue, Alexandria, Virginia 22314, United States
- The George Washington University, 800 22nd Street NW, Washington, DC, 20052, United States
| |
Collapse
|
10
|
Venkatraman V. FP-MAP: an extensive library of fingerprint-based molecular activity prediction tools. Front Chem 2023; 11:1239467. [PMID: 37649967 PMCID: PMC10462816 DOI: 10.3389/fchem.2023.1239467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023] Open
Abstract
Discovering new drugs for disease treatment is challenging, requiring a multidisciplinary effort as well as time, and resources. With a view to improving hit discovery and lead compound identification, machine learning (ML) approaches are being increasingly used in the decision-making process. Although a number of ML-based studies have been published, most studies only report fragments of the wider range of bioactivities wherein each model typically focuses on a particular disease. This study introduces FP-MAP, an extensive atlas of fingerprint-based prediction models that covers a diverse range of activities including neglected tropical diseases (caused by viral, bacterial and parasitic pathogens) as well as other targets implicated in diseases such as Alzheimer's. To arrive at the best predictive models, performance of ≈4,000 classification/regression models were evaluated on different bioactivity data sets using 12 different molecular fingerprints. The best performing models that achieved test set AUC values of 0.62-0.99 have been integrated into an easy-to-use graphical user interface that can be downloaded from https://gitlab.com/vishsoft/fpmap.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
11
|
Rajan A, Pushkar AP, Dharmalingam BC, Varghese JJ. Iterative multiscale and multi-physics computations for operando catalyst nanostructure elucidation and kinetic modeling. iScience 2023; 26:107029. [PMID: 37360694 PMCID: PMC10285649 DOI: 10.1016/j.isci.2023.107029] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2023] Open
Abstract
Modern heterogeneous catalysis has benefitted immensely from computational predictions of catalyst structure and its evolution under reaction conditions, first-principles mechanistic investigations, and detailed kinetic modeling, which are rungs on a multiscale workflow. Establishing connections across these rungs and integration with experiments have been challenging. Here, operando catalyst structure prediction techniques using density functional theory simulations and ab initio thermodynamics calculations, molecular dynamics, and machine learning techniques are presented. Surface structure characterization by computational spectroscopic and machine learning techniques is then discussed. Hierarchical approaches in kinetic parameter estimation involving semi-empirical, data-driven, and first-principles calculations and detailed kinetic modeling via mean-field microkinetic modeling and kinetic Monte Carlo simulations are discussed along with methods and the need for uncertainty quantification. With these as the background, this article proposes a bottom-up hierarchical and closed loop modeling framework incorporating consistency checks and iterative refinements at each level and across levels.
Collapse
Affiliation(s)
- Ajin Rajan
- Department of Chemical Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Anoop P. Pushkar
- Department of Chemical Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Balaji C. Dharmalingam
- Department of Chemical Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Jithin John Varghese
- Department of Chemical Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| |
Collapse
|
12
|
Tseng YJ, Chuang PJ, Appell M. When Machine Learning and Deep Learning Come to the Big Data in Food Chemistry. ACS OMEGA 2023; 8:15854-15864. [PMID: 37179635 PMCID: PMC10173424 DOI: 10.1021/acsomega.2c07722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/07/2023] [Indexed: 05/15/2023]
Abstract
Since the first food database was released over one hundred years ago, food databases have become more diversified, including food composition databases, food flavor databases, and food chemical compound databases. These databases provide detailed information about the nutritional compositions, flavor molecules, and chemical properties of various food compounds. As artificial intelligence (AI) is becoming popular in every field, AI methods can also be applied to food industry research and molecular chemistry. Machine learning and deep learning are valuable tools for analyzing big data sources such as food databases. Studies investigating food compositions, flavors, and chemical compounds with AI concepts and learning methods have emerged in the past few years. This review illustrates several well-known food databases, focusing on their primary contents, interfaces, and other essential features. We also introduce some of the most common machine learning and deep learning methods. Furthermore, a few studies related to food databases are given as examples, demonstrating their applications in food pairing, food-drug interactions, and molecular modeling. Based on the results of these applications, it is expected that the combination of food databases and AI will play an essential role in food science and food chemistry.
Collapse
Affiliation(s)
- Yufeng Jane Tseng
- Graduate
Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1 Roosevelt Rd. Sec. 4, Taipei 10617, Taiwan
- Y.J.T.:
tel, +886.2.3366.4888#529; fax, +886.2.23628167; email,
| | - Pei-Jiun Chuang
- Graduate
Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1 Roosevelt Rd. Sec. 4, Taipei 10617, Taiwan
| | - Michael Appell
- USDA,
Agricultural Research Service, National Center for Agricultural Utilization
Research, Mycotoxin Prevention
and Applied Microbiology Research Unit, 1815 N. University, Peoria, Illinois. 61604, United States
| |
Collapse
|
13
|
Fan J, Qian C, Zhou S. Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol. RESEARCH (WASHINGTON, D.C.) 2023; 6:0115. [PMID: 37287889 PMCID: PMC10243197 DOI: 10.34133/research.0115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 03/20/2023] [Indexed: 06/09/2023]
Abstract
A corrected group contribution (CGC)-molecule contribution (MC)-Bayesian neural network (BNN) protocol for accurate prediction of absorption spectra is presented. Upon combination of BNN with CGC methods, the full absorption spectra of various molecules are afforded accurately and efficiently-by using only a small dataset for training. Here, with a small training sample (<100), accurate prediction of maximum wavelength for single molecules is afforded with the first stage of the protocol; by contrast, previously reported machine learning (ML) methods require >1,000 samples to ensure the accuracy of prediction. Furthermore, with <500 samples, the mean square error in the prediction of full ultraviolet spectra reaches <2%; for comparison, ML models with molecular SMILES for training require a much larger dataset (>2,000) to achieve comparable accuracy. Moreover, by employing an MC method designed specifically for CGC that properly interprets the mixing rule, the spectra of mixtures are obtained with high accuracy. The logical origins of the good performance of the protocol are discussed in detail. Considering that such a constituent contribution protocol combines chemical principles and data-driven tools, most likely, it will be proven efficient to solve molecular-property-relevant problems in wider fields.
Collapse
Affiliation(s)
- Jinming Fan
- College of Chemical and Biological Engineering, Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, Zhejiang University, 310027 Hangzhou, P. R. China
- Institute of Zhejiang University - Quzhou, Zheda Rd. #99, 324000 Quzhou, P. R. China
| | - Chao Qian
- College of Chemical and Biological Engineering, Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, Zhejiang University, 310027 Hangzhou, P. R. China
- Institute of Zhejiang University - Quzhou, Zheda Rd. #99, 324000 Quzhou, P. R. China
| | - Shaodong Zhou
- College of Chemical and Biological Engineering, Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, Zhejiang University, 310027 Hangzhou, P. R. China
- Institute of Zhejiang University - Quzhou, Zheda Rd. #99, 324000 Quzhou, P. R. China
| |
Collapse
|
14
|
Boswell Z, Verga JU, Mackle J, Guerrero-Vazquez K, Thomas OP, Cray J, Wolf BJ, Choo YM, Croot P, Hamann MT, Hardiman G. In-Silico Approaches for the Screening and Discovery of Broad-Spectrum Marine Natural Product Antiviral Agents Against Coronaviruses. Infect Drug Resist 2023; 16:2321-2338. [PMID: 37155475 PMCID: PMC10122865 DOI: 10.2147/idr.s395203] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 03/16/2023] [Indexed: 05/10/2023] Open
Abstract
The urgent need for SARS-CoV-2 controls has led to a reassessment of approaches to identify and develop natural product inhibitors of zoonotic, highly virulent, and rapidly emerging viruses. There are yet no clinically approved broad-spectrum antivirals available for beta-coronaviruses. Discovery pipelines for pan-virus medications against a broad range of betacoronaviruses are therefore a priority. A variety of marine natural product (MNP) small molecules have shown inhibitory activity against viral species. Access to large data caches of small molecule structural information is vital to finding new pharmaceuticals. Increasingly, molecular docking simulations are being used to narrow the space of possibilities and generate drug leads. Combining in-silico methods, augmented by metaheuristic optimization and machine learning (ML) allows the generation of hits from within a virtual MNP library to narrow screens for novel targets against coronaviruses. In this review article, we explore current insights and techniques that can be leveraged to generate broad-spectrum antivirals against betacoronaviruses using in-silico optimization and ML. ML approaches are capable of simultaneously evaluating different features for predicting inhibitory activity. Many also provide a semi-quantitative measure of feature relevance and can guide in selecting a subset of features relevant for inhibition of SARS-CoV-2.
Collapse
Affiliation(s)
- Zachary Boswell
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
| | - Jacopo Umberto Verga
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
- Genomic Data Science, University of Galway, Galway, Ireland
| | - James Mackle
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
| | | | - Olivier P Thomas
- School of Biological and Chemical Sciences, Ryan Institute, University of Galway, Galway, H91TK33Ireland
| | - James Cray
- Department of Biomedical Education and Anatomy, College of Medicine and Division of Biosciences, College of Dentistry, Ohio State University, Columbus, OH, USA
| | - Bethany J Wolf
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
| | - Yeun-Mun Choo
- Department of Chemistry, University of Malaya, Kuala Lumpur, Malaysia
| | - Peter Croot
- Irish Centre for Research in Applied Geoscience, Earth and Ocean Sciences and Ryan Institute, School of Natural Sciences, University of Galway, Galway, Ireland
| | - Mark T Hamann
- Departments of Drug Discovery and Biomedical Sciences and Public Health, Colleges of Pharmacy and Medicine, Medical University of South Carolina, Charleston, SC, USA
| | - Gary Hardiman
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
- Department of Medicine, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
15
|
Guo J, Sun M, Zhao X, Shi C, Su H, Guo Y, Pu X. General Graph Neural Network-Based Model To Accurately Predict Cocrystal Density and Insight from Data Quality and Feature Representation. J Chem Inf Model 2023; 63:1143-1156. [PMID: 36734616 DOI: 10.1021/acs.jcim.2c01538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Cocrystal engineering as an effective way to modify solid-state properties has inspired great interest from diverse material fields while cocrystal density is an important property closely correlated with the material function. In order to accurately predict the cocrystal density, we develop a graph neural network (GNN)-based deep learning framework by considering three key factors of machine learning (data quality, feature presentation, and model architecture). The result shows that different stoichiometric ratios of molecules in cocrystals can significantly influence the prediction performances, highlighting the importance of data quality. In addition, the feature complementary is not suitable for augmenting the molecular graph representation in the cocrystal density prediction, suggesting that the complementary strategy needs to consider whether extra features can sufficiently supplement the lacked information in the original representation. Based on these results, 4144 cocrystals with 1:1 stoichiometry ratio are selected as the dataset, supplemented by the data augmentation of exchanging a pair of coformers. The molecular graph is determined to learn feature representation to train the GNN-based model. Global attention is introduced to further optimize the feature space and identify important atoms to realize the interpretability of the model. Benefited from the advantages, our model significantly outperforms three competitive models and exhibits high prediction accuracy for unseen cocrystals, showcasing its robustness and generality. Overall, our work not only provides a general cocrystal density prediction tool for experimental investigations but also provides useful guidelines for the machine learning application. All source codes are freely available at https://github.com/Xiao-Gua00/CCPGraph.
Collapse
Affiliation(s)
- Jiali Guo
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| | - Ming Sun
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| | - Xueyan Zhao
- Institute of Chemical Materials, China Academy of Engineering Physics, Mianyang621900, China
| | - Chaojie Shi
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| | - Haoming Su
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| |
Collapse
|
16
|
Joshi PB. Navigating with chemometrics and machine learning in chemistry. Artif Intell Rev 2023; 56:1-26. [PMID: 36714038 PMCID: PMC9870782 DOI: 10.1007/s10462-023-10391-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/09/2023] [Indexed: 01/25/2023]
Abstract
Chemometrics and machine learning are artificial intelligence-based methods stirring a transformative change in chemistry. Organic synthesis, drug discovery and analytical techniques are incorporating machine learning techniques at an accelerated pace. However, machine-assisted chemistry faces challenges while solving critical problems in chemistry due to complex relationships in data sets. Even with increasing publishing volumes on machine learning, its application in areas of chemistry is not a straightforward endeavour. A particular concern in applying machine learning in chemistry is data availability and reproducibility. The present review article discusses the various chemometric methods, expert systems, and machine learning techniques developed for solving problems of organic synthesis and drug discovery with selected examples. Further, a concise discussion on chemometrics and ML deployed in analytical techniques such as, spectroscopy, microscopy and chromatography are presented. Finally, the review reflects the challenges, opportunities and future perspectives on machine learning and automation in chemistry. The review concludes by pondering on some tough questions on applying machine learning and their possibility of navigation in the different terrains of chemistry.
Collapse
Affiliation(s)
- Payal B. Joshi
- Operations and Method Development, Shefali Research Laboratories, Ambernath (East), Thane, Maharashtra 421501 India
| |
Collapse
|
17
|
Bort W, Mazitov D, Horvath D, Bonachera F, Lin A, Marcou G, Baskin I, Madzhidov T, Varnek A. Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder. J Chem Inf Model 2022; 62:5471-5484. [PMID: 36332178 DOI: 10.1021/acs.jcim.2c01086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Daniyar Mazitov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Fanny Bonachera
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Igor Baskin
- Department of Material Science and Engineering, Technion─Israel Institute of Technology, 3200003 Haifa, Israel
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
18
|
Qureshi R, Basit SA, Shamsi JA, Fan X, Nawaz M, Yan H, Alam T. Machine learning based personalized drug response prediction for lung cancer patients. Sci Rep 2022; 12:18935. [PMID: 36344580 PMCID: PMC9640729 DOI: 10.1038/s41598-022-23649-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 11/03/2022] [Indexed: 11/09/2022] Open
Abstract
Lung cancers with a mutated epidermal growth factor receptor (EGFR) are a major contributor to cancer fatalities globally. Targeted tyrosine kinase inhibitors (TKIs) have been developed against EGFR and show encouraging results for survival rate and quality of life. However, drug resistance may affect treatment plans and treatment efficacy may be lost after about a year. Predicting the response to EGFR-TKIs for EGFR-mutated lung cancer patients is a key research area. In this study, we propose a personalized drug response prediction model (PDRP), based on molecular dynamics simulations and machine learning, to predict the response of first generation FDA-approved small molecule EGFR-TKIs, Gefitinib/Erlotinib, in lung cancer patients. The patient's mutation status is taken into consideration in molecular dynamics (MD) simulation. Each patient's unique mutation status was modeled considering MD simulation to extract molecular-level geometric features. Moreover, additional clinical features were incorporated into machine learning model for drug response prediction. The complete feature set includes demographic and clinical information (DCI), geometrical properties of the drug-target binding site, and the binding free energy of the drug-target complex from the MD simulation. PDRP incorporates an XGBoost classifier, which achieves state-of-the-art performance with 97.5% accuracy, 93% recall, 96.5% precision, and 94% F1-score, for a 4-class drug response prediction task. We found that modeling the geometry of the binding pocket combined with binding free energy is a good predictor for drug response. However, we observed that clinical information had a little impact on the performance of the model. The proposed model could be tested on other types of cancers. We believe PDRP will support the planning of effective treatment regimes based on clinical-genomic information. The source code and related files are available on GitHub at: https://github.com/rizwanqureshi123/PDRP/ .
Collapse
Affiliation(s)
- Rizwan Qureshi
- grid.452146.00000 0004 1789 3191College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Syed Abdullah Basit
- FAST National University of Computer and Emerging Sciences, Karachi, Pakistan
| | - Jawwad A. Shamsi
- FAST National University of Computer and Emerging Sciences, Karachi, Pakistan
| | - Xinqi Fan
- grid.35030.350000 0004 1792 6846Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong ,grid.35030.350000 0004 1792 6846Center for Intelligent Multidimensional Data Analysis (CIMDA), City University of Hong Kong, Kowloon, Hong Kong
| | - Mehmood Nawaz
- grid.10784.3a0000 0004 1937 0482Department of Biomedical Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong, SAR China
| | - Hong Yan
- grid.35030.350000 0004 1792 6846Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong ,grid.35030.350000 0004 1792 6846Center for Intelligent Multidimensional Data Analysis (CIMDA), City University of Hong Kong, Kowloon, Hong Kong
| | - Tanvir Alam
- grid.452146.00000 0004 1789 3191College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
19
|
Matlin SA, Cornell SE, Krief A, Hopf H, Mehta G. Chemistry must respond to the crisis of transgression of planetary boundaries. Chem Sci 2022; 13:11710-11720. [PMID: 36348954 PMCID: PMC9627718 DOI: 10.1039/d2sc03603g] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 09/12/2022] [Indexed: 11/22/2022] Open
Abstract
Recent assessments alarmingly indicate that many of the world's leading chemicals are transgressing one or more of the nine planetary boundaries, which define safe operating spaces within which humanity can continue to develop and thrive for generations to come. The unfolding crisis cannot be ignored and there is a once-in-a-century opportunity for chemistry - the science of transformation of matter - to make a critical difference to the future of people and planet. How can chemists contribute to meeting these challenges and restore stability and strengthen resilience to the planetary system that humanity needs for its survival? To respond to the wake-up call, three crucial steps are outlined: (1) urgently working to understand the nature of the looming threats, from a chemistry perspective; (2) harnessing the ingenuity and innovation that are central to the practice of chemistry to develop sustainable solutions; and (3) transforming chemistry itself, in education, research and industry, to re-position it as 'chemistry for sustainability' and lead the stewardship of the world's chemical resources. This will require conservation of material stocks in forms that remain available for use, through attention to circularity, as well as strengthening engagement in systems-based approaches to designing chemistry research and processes informed by convergent working with many other disciplines.
Collapse
Affiliation(s)
- Stephen A Matlin
- Institute of Global Health Innovation, Imperial College London London SW7 2AZ UK
- International Organization for Chemical Sciences in Development 61 rue de Bruxelles B-5000 Namur Belgium
| | - Sarah E Cornell
- International Organization for Chemical Sciences in Development 61 rue de Bruxelles B-5000 Namur Belgium
- Stockholm Resilience Centre, Faculty of Science, Stockholm University Stockholm Sweden
| | - Alain Krief
- International Organization for Chemical Sciences in Development 61 rue de Bruxelles B-5000 Namur Belgium
- Chemistry Department, Namur University B-5000 Namur Belgium
| | - Henning Hopf
- International Organization for Chemical Sciences in Development 61 rue de Bruxelles B-5000 Namur Belgium
- Institute of Organic Chemistry, Technische Universität Braunschweig Braunschweig D-38106 Germany
| | - Goverdhan Mehta
- International Organization for Chemical Sciences in Development 61 rue de Bruxelles B-5000 Namur Belgium
- School of Chemistry, University of Hyderabad Hyderabad 500046 India
| |
Collapse
|
20
|
Galuzzi BG, Mirarchi A, Viganò EL, De Gioia L, Damiani C, Arrigoni F. Machine Learning for Efficient Prediction of Protein Redox Potential: The Flavoproteins Case. J Chem Inf Model 2022; 62:4748-4759. [PMID: 36126254 PMCID: PMC9554915 DOI: 10.1021/acs.jcim.2c00858] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
Determining the redox
potentials of protein cofactors
and how they
are influenced by their molecular neighborhoods is essential for basic
research and many biotechnological applications, from biosensors and
biocatalysis to bioremediation and bioelectronics. The laborious determination
of redox potential with current experimental technologies pushes forward
the need for computational approaches that can reliably predict it.
Although current computational approaches based on quantum and molecular
mechanics are accurate, their large computational costs hinder their
usage. In this work, we explored the possibility of using more efficient
QSPR models based on machine learning (ML) for the prediction of protein
redox potential, as an alternative to classical approaches. As a proof
of concept, we focused on flavoproteins, one of the most important
families of enzymes directly involved in redox processes. To train
and test different ML models, we retrieved a dataset of flavoproteins
with a known midpoint redox potential (Em) and 3D structure. The features of interest, accounting for both
short- and long-range effects of the protein matrix on the flavin
cofactor, have been automatically extracted from each protein PDB
file. Our best ML model (XGB) has a performance error below 1 kcal/mol
(∼36 mV), comparing favorably to more sophisticated computational
approaches. We also provided indications on the features that mostly
affect the Em value, and when possible,
we rationalized them on the basis of previous studies.
Collapse
Affiliation(s)
- Bruno Giovanni Galuzzi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza 2, 20126 Milan, Italy.,SYSBIO Centre of Systems Biology/ISBE.IT, Piazza della Scienza 2, 20126, Milan, Italy
| | - Antonio Mirarchi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza 2, 20126 Milan, Italy
| | - Edoardo Luca Viganò
- Istituto di Ricerche Farmacologiche Mario Negri, Via Mario Negri 2, 20156 Milan, Italy
| | - Luca De Gioia
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza 2, 20126 Milan, Italy
| | - Chiara Damiani
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza 2, 20126 Milan, Italy.,SYSBIO Centre of Systems Biology/ISBE.IT, Piazza della Scienza 2, 20126, Milan, Italy
| | - Federica Arrigoni
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza 2, 20126 Milan, Italy
| |
Collapse
|
21
|
Machine Learning-Based Virtual Screening for the Identification of Cdk5 Inhibitors. Int J Mol Sci 2022; 23:ijms231810653. [PMID: 36142566 PMCID: PMC9502400 DOI: 10.3390/ijms231810653] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 09/07/2022] [Accepted: 09/09/2022] [Indexed: 12/04/2022] Open
Abstract
Cyclin-dependent kinase 5 (Cdk5) is an atypical proline-directed serine/threonine protein kinase well-characterized for its role in the central nervous system rather than in the cell cycle. Indeed, its dysregulation has been strongly implicated in the progression of synaptic dysfunction and neurodegenerative diseases, such as Alzheimer’s disease (AD) and Parkinson’s disease (PD), and also in the development and progression of a variety of cancers. For this reason, Cdk5 is considered as a promising target for drug design, and the discovery of novel small-molecule Cdk5 inhibitors is of great interest in the medicinal chemistry field. In this context, we employed a machine learning-based virtual screening protocol with subsequent molecular docking, molecular dynamics simulations and binding free energy evaluations. Our virtual screening studies resulted in the identification of two novel Cdk5 inhibitors, highlighting an experimental hit rate of 50% and thus validating the reliability of the in silico workflow. Both identified ligands, compounds CPD1 and CPD4, showed a promising enzyme inhibitory activity and CPD1 also demonstrated a remarkable antiproliferative activity in ovarian and colon cancer cells. These ligands represent a valuable starting point for structure-based hit-optimization studies aimed at identifying new potent Cdk5 inhibitors.
Collapse
|
22
|
Faceira B, Teule-Gay L, Rignanese GM, Rougier A. Toward the Prediction of Electrochromic Properties of WO 3 Films: Combination of Experimental and Machine Learning Approaches. J Phys Chem Lett 2022; 13:8111-8115. [PMID: 35997759 DOI: 10.1021/acs.jpclett.2c02248] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
WO3 is the state of the art of electrochromic oxide materials finding technological application in smart windows. In this work, a set of WO3 thin films were deposited by magnetron sputtering by varying total pressure, oxygen partial pressure, and power. On each film two properties were measured, the electrochemical reversibility and the blue color persistence of LixWO3 films in simulated ambient conditions. With the help of machine learning, prediction maps for such electrochromic properties, namely, color persistence and reversibility, were designed. High-performance WO3 films were targeted by a global score which is the product of these two properties. The combined approach of experimental measurements and machine learning led to a complete picture of electrochromic properties depending of sputtering parameters providing an efficient tool in regards to time saving.
Collapse
Affiliation(s)
- Brandon Faceira
- Univ. Bordeaux, CNRS, Bx INP, ICMCB, UMR 5026, F-33600 Pessac, France
| | - Lionel Teule-Gay
- Univ. Bordeaux, CNRS, Bx INP, ICMCB, UMR 5026, F-33600 Pessac, France
| | | | - Aline Rougier
- Univ. Bordeaux, CNRS, Bx INP, ICMCB, UMR 5026, F-33600 Pessac, France
| |
Collapse
|
23
|
Fedik N, Zubatyuk R, Kulichenko M, Lubbers N, Smith JS, Nebgen B, Messerly R, Li YW, Boldyrev AI, Barros K, Isayev O, Tretiak S. Extending machine learning beyond interatomic potentials for predicting molecular properties. Nat Rev Chem 2022; 6:653-672. [PMID: 37117713 DOI: 10.1038/s41570-022-00416-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2022] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) is becoming a method of choice for modelling complex chemical processes and materials. ML provides a surrogate model trained on a reference dataset that can be used to establish a relationship between a molecular structure and its chemical properties. This Review highlights developments in the use of ML to evaluate chemical properties such as partial atomic charges, dipole moments, spin and electron densities, and chemical bonding, as well as to obtain a reduced quantum-mechanical description. We overview several modern neural network architectures, their predictive capabilities, generality and transferability, and illustrate their applicability to various chemical properties. We emphasize that learned molecular representations resemble quantum-mechanical analogues, demonstrating the ability of the models to capture the underlying physics. We also discuss how ML models can describe non-local quantum effects. Finally, we conclude by compiling a list of available ML toolboxes, summarizing the unresolved challenges and presenting an outlook for future development. The observed trends demonstrate that this field is evolving towards physics-based models augmented by ML, which is accompanied by the development of new methods and the rapid growth of user-friendly ML frameworks for chemistry.
Collapse
|
24
|
Li C, Wang C, Sun M, Zeng Y, Yuan Y, Gou Q, Wang G, Guo Y, Pu X. Correlated RNN Framework to Quickly Generate Molecules with Desired Properties for Energetic Materials in the Low Data Regime. J Chem Inf Model 2022; 62:4873-4887. [PMID: 35998331 DOI: 10.1021/acs.jcim.2c00997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Motivated by the challenging of deep learning on the low data regime and the urgent demand for intelligent design on highly energetic materials, we explore a correlated deep learning framework, which consists of three recurrent neural networks (RNNs) correlated by the transfer learning strategy, to efficiently generate new energetic molecules with a high detonation velocity in the case of very limited data available. To avoid the dependence on the external big data set, data augmentation by fragment shuffling of 303 energetic compounds is utilized to produce 500,000 molecules to pretrain RNN, through which the model can learn sufficient structure knowledge. Then the pretrained RNN is fine-tuned by focusing on the 303 energetic compounds to generate 7153 molecules similar to the energetic compounds. In order to more reliably screen the molecules with a high detonation velocity, the SMILE enumeration augmentation coupled with the pretrained knowledge is utilized to build an RNN-based prediction model, through which R2 is boosted from 0.4446 to 0.9572. The comparable performance with the transfer learning strategy based on an existing big database (ChEMBL) to produce the energetic molecules and drug-like ones further supports the effectiveness and generality of our strategy in the low data regime. High-precision quantum mechanics calculations further confirm that 35 new molecules present a higher detonation velocity and lower synthetic accessibility than the classic explosive RDX, along with good thermal stability. In particular, three new molecules are comparable to caged CL-20 in the detonation velocity. All the source codes and the data set are freely available at https://github.com/wangchenghuidream/RNNMGM.
Collapse
Affiliation(s)
- Chuan Li
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Chenghui Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Ming Sun
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yan Zeng
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yuan Yuan
- College of Management, Southwest University for Nationalities, Chengdu 610041, China
| | - Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Guangchuan Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
25
|
Predicting the photosynthetic ammonia on nanoporous cobalt zirconate via graph convolutional neural networks. MOLECULAR CATALYSIS 2022. [DOI: 10.1016/j.mcat.2022.112565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
26
|
When machine learning meets molecular synthesis. TRENDS IN CHEMISTRY 2022. [DOI: 10.1016/j.trechm.2022.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
27
|
Phenotypic drug discovery: recent successes, lessons learned and new directions. Nat Rev Drug Discov 2022; 21:899-914. [DOI: 10.1038/s41573-022-00472-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/14/2022] [Indexed: 12/29/2022]
|
28
|
Autonomous design of new chemical reactions using a variational autoencoder. Commun Chem 2022; 5:40. [PMID: 36697652 PMCID: PMC9814385 DOI: 10.1038/s42004-022-00647-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 02/16/2022] [Indexed: 01/28/2023] Open
Abstract
Artificial intelligence based chemistry models are a promising method of exploring chemical reaction design spaces. However, training datasets based on experimental synthesis are typically reported only for the optimal synthesis reactions. This leads to an inherited bias in the model predictions. Therefore, robust datasets that span the entirety of the solution space are necessary to remove inherited bias and permit complete training of the space. In this study, an artificial intelligence model based on a Variational AutoEncoder (VAE) has been developed and investigated to synthetically generate continuous datasets. The approach involves sampling the latent space to generate new chemical reactions. This developed technique is demonstrated by generating over 7,000,000 new reactions from a training dataset containing only 7,000 reactions. The generated reactions include molecular species that are larger and more diverse than the training set.
Collapse
|
29
|
Shao J, Liu Y, Yan J, Yan ZY, Wu Y, Ru Z, Liao JY, Miao X, Qian L. Prediction of Maximum Absorption Wavelength Using Deep Neural Networks. J Chem Inf Model 2022; 62:1368-1375. [PMID: 35290042 DOI: 10.1021/acs.jcim.1c01449] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Fluorescent molecules are important tools in biological detection, and numerous efforts have been made to develop compounds to meet the desired photophysical properties. For example, tuning the wavelength allows an appropriate penetration depth with minimal interference from the autofluorescence/scattering for a better signal-to-noise contrast. However, there are limited guidelines to rationally design or computationally predict the optical properties from first principles, and factors like the solvent effects will make it more complicated. Herein, we established a database (SMFluo1) of 1181 solvated small-molecule fluorophores covering the ultraviolet-visible-near-infrared absorption window and developed new machine learning models based on deep neural networks for accurately predicting photophysical parameters. The optimal system was applied to 120 out-of-sample compounds, and it exhibited remarkable accuracy with a mean relative error of 1.52%. In this new paradigm, a deep learning algorithm is promising to complement conventional theoretical and experimental studies of fluorophores and to greatly accelerate the discovery of new dyes. Due to its simplicity and efficiency, data from newly developed fluorophores can be easily supplemented to this system to further improve the accuracy across various dye families.
Collapse
Affiliation(s)
- Jinning Shao
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, Hangzhou, China 310058
| | - Yue Liu
- Center for Data Science, Zhejiang University, Hangzhou, China 310058.,Polytechnic Institute, Zhejiang University, Hangzhou, China 310058
| | - Jiaqi Yan
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, Hangzhou, China 310058
| | - Ze-Yi Yan
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, Hangzhou, China 310058.,Polytechnic Institute, Zhejiang University, Hangzhou, China 310058
| | - Yangyang Wu
- Center for Data Science, Zhejiang University, Hangzhou, China 310058
| | - Zhongying Ru
- Center for Data Science, Zhejiang University, Hangzhou, China 310058.,Polytechnic Institute, Zhejiang University, Hangzhou, China 310058
| | - Jia-Yu Liao
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, Hangzhou, China 310058.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Hangzhou, China 310018
| | - Xiaoye Miao
- Center for Data Science, Zhejiang University, Hangzhou, China 310058
| | - Linghui Qian
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, Hangzhou, China 310058
| |
Collapse
|
30
|
Galati S, Di Stefano M, Martinelli E, Macchia M, Martinelli A, Poli G, Tuccinardi T. VenomPred: A Machine Learning Based Platform for Molecular Toxicity Predictions. Int J Mol Sci 2022; 23:ijms23042105. [PMID: 35216217 PMCID: PMC8877213 DOI: 10.3390/ijms23042105] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 02/11/2022] [Accepted: 02/12/2022] [Indexed: 12/28/2022] Open
Abstract
The use of in silico toxicity prediction methods plays an important role in the selection of lead compounds and in ADMET studies since in vitro and in vivo methods are often limited by ethics, time, budget and other resources. In this context, we present our new web tool VenomPred, a user-friendly platform for evaluating the potential mutagenic, hepatotoxic, carcinogenic and estrogenic effects of small molecules. VenomPred platform employs several in-house Machine Learning (ML) models developed with datasets derived from VEGA QSAR, a software that includes a comprehensive collection of different toxicity models and has been used as a reference for building and evaluating our ML models. The results showed that our models achieved equal or better performance than those obtained with the reference models included in VEGA QSAR. In order to improve the predictive performance of our platform, we adopted a consensus approach combining the results of different ML models, which was able to predict chemical toxicity better than the single models. This improved method was thus implemented in the VenomPred platform, a freely accessible webserver that takes the SMILES (Simplified Molecular-Input Line-Entry System) strings of the compounds as input and sends the prediction results providing a probability score about their potential toxicity.
Collapse
Affiliation(s)
- Salvatore Galati
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (E.M.); (M.M.); (A.M.); (T.T.)
| | - Miriana Di Stefano
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (E.M.); (M.M.); (A.M.); (T.T.)
- Department of Life Sciences, University of Siena, 53100 Siena, Italy
| | - Elisa Martinelli
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (E.M.); (M.M.); (A.M.); (T.T.)
| | - Marco Macchia
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (E.M.); (M.M.); (A.M.); (T.T.)
| | - Adriano Martinelli
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (E.M.); (M.M.); (A.M.); (T.T.)
| | - Giulio Poli
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (E.M.); (M.M.); (A.M.); (T.T.)
- Correspondence: ; Tel.: +39-050-2219603
| | - Tiziano Tuccinardi
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy; (S.G.); (M.D.S.); (E.M.); (M.M.); (A.M.); (T.T.)
- Center for Biotechnology, Sbarro Institute for Cancer Research and Molecular Medicine, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|
31
|
Affiliation(s)
- Leo H. Chiang
- Core R&D The Dow Chemical Company Lake Jackson Texas 77566 USA
| | - Birgit Braun
- Core R&D The Dow Chemical Company Lake Jackson Texas 77566 USA
| | - Zhenyu Wang
- Chemometrics, AI & Statistics The Dow Chemical Company Lake Jackson Texas 77566 USA
| | - Ivan Castillo
- Chemometrics, AI & Statistics The Dow Chemical Company Lake Jackson Texas 77566 USA
| |
Collapse
|
32
|
Jo J, Kwak B, Lee B, Yoon S. Flexible Dual-Branched Message-Passing Neural Network for a Molecular Property Prediction. ACS OMEGA 2022; 7:4234-4244. [PMID: 35155916 PMCID: PMC8829939 DOI: 10.1021/acsomega.1c05877] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 01/17/2022] [Indexed: 05/25/2023]
Abstract
A molecule is a complex of heterogeneous components, and the spatial arrangements of these components determine the whole molecular properties and characteristics. With the advent of deep learning in computational chemistry, several studies have focused on how to predict molecular properties based on molecular configurations. MA message-passing neural network provides an effective framework for capturing molecular geometric features with the perspective of a molecule as a graph. However, most of these studies assumed that all heterogeneous molecular features, such as atomic charge, bond length, or other geometric features, always contribute equivalently to the target prediction, regardless of the task type. In this study, we propose a dual-branched neural network for molecular property prediction based on both the message-passing framework and standard multilayer perceptron neural networks. Our model learns heterogeneous molecular features with different scales, which are trained flexibly according to each prediction target. In addition, we introduce a discrete branch to learn single-atom features without local aggregation, apart from message-passing steps. We verify that this novel structure can improve the model performance. The proposed model outperforms other recent models with sparser representations. Our experimental results indicate that, in the chemical property prediction tasks, the diverse chemical nature of targets should be carefully considered for both model performance and generalizability. Finally, we provide the intuitive analysis between the experimental results and the chemical meaning of the target.
Collapse
Affiliation(s)
- Jeonghee Jo
- Bio-MAX
Institute, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Bumju Kwak
- Recommendation
Team, Kakao Corporation, 235 Pangyoyeok-ro, Bundang-gu, Seongnam-si, Gyeonggi-do 13494, Republic of Korea
| | - Byunghan Lee
- Department
of Electronic and IT Media Engineering, Seoul National University of Science and Technology, 232 Gongneung-ro, Nowon-gu, Seoul 01811, Republic of Korea
| | - Sungroh Yoon
- Department
of Electrical and Computer Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
33
|
Krajňák V, Naik S, Wiggins S. Predicting trajectory behaviour via machine-learned invariant manifolds. Chem Phys Lett 2022. [DOI: 10.1016/j.cplett.2021.139290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
34
|
Stark spectral line broadening modeling by machine learning algorithms. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06763-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
35
|
Li B, Rangarajan S. A conceptual study of transfer learning with linear models for data-driven property prediction. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2021.107599] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
36
|
Shin HK. Topological Distance-Based Electron Interaction Tensor to Apply a Convolutional Neural Network on Drug-like Compounds. ACS OMEGA 2021; 6:35757-35768. [PMID: 34984306 PMCID: PMC8717557 DOI: 10.1021/acsomega.1c05693] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/08/2021] [Indexed: 05/15/2023]
Abstract
Deep learning (DL) models in quantitative structure-activity relationship fed the molecular structure directly to the network without using human-designed descriptors by representing molecule as a graph or string (e.g., SMILES code). However, these two representations were oversimplification of real molecules to reflect chemical properties of molecular structures. Given that the choice of molecular representation determines the architecture of the DL model to apply, a novel way of molecular representation can open a way to apply diverse DL networks developed and used in other fields. A topological distance-based electron interaction (TDEi) tensor has been developed in this study inspired by the quantum mechanical model of the molecule, which defines a molecule with electrons and protons. In the TDEi tensor, the atomic orbital (AO) of each atom is represented by an electron configuration (EC) vector, which is a bit string based on the presence and absence of electrons in each AO according to spin indicated by positive and negative signs. Interactions between EC vectors were calculated based on the topological distance between atoms in a molecule. As a molecular structure was translated into 3D array, CNN models (modified VGGNet) were applied using a TDEi tensor to predict four physicochemical properties of drug-like compound datasets: MP (275,131), Lipop (4193), Esol (1127), and Freesolv (639). Models achieved good prediction accuracy. PCA showed that a stronger correlation was observed between the extracted features and the target endpoint as features were extracted from the deeper layer.
Collapse
Affiliation(s)
- Hyun Kil Shin
- Department
of Predictive Toxicology, Korea Institute
of Toxicology, Daejeon 34114, Republic of Korea
- Human
and Environmental Toxicology, University
of Science and Technology, Daejeon 34113, Republic of Korea
| |
Collapse
|
37
|
Selvaraj C, Chandra I, Singh SK. Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries. Mol Divers 2021; 26:1893-1913. [PMID: 34686947 PMCID: PMC8536481 DOI: 10.1007/s11030-021-10326-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 09/24/2021] [Indexed: 12/27/2022]
Abstract
The global spread of COVID-19 has raised the importance of pharmaceutical drug development as intractable and hot research. Developing new drug molecules to overcome any disease is a costly and lengthy process, but the process continues uninterrupted. The critical point to consider the drug design is to use the available data resources and to find new and novel leads. Once the drug target is identified, several interdisciplinary areas work together with artificial intelligence (AI) and machine learning (ML) methods to get enriched drugs. These AI and ML methods are applied in every step of the computer-aided drug design, and integrating these AI and ML methods results in a high success rate of hit compounds. In addition, this AI and ML integration with high-dimension data and its powerful capacity have taken a step forward. Clinical trials output prediction through the AI/ML integrated models could further decrease the clinical trials cost by also improving the success rate. Through this review, we discuss the backend of AI and ML methods in supporting the computer-aided drug design, along with its challenge and opportunity for the pharmaceutical industry. From the available information or data, the AI and ML based prediction for the high throughput virtual screening. After this integration of AI and ML, the success rate of hit identification has gained a momentum with huge success by providing novel drugs.
Collapse
Affiliation(s)
- Chandrabose Selvaraj
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India.
| | - Ishwar Chandra
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India
| | - Sanjeev Kumar Singh
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India.
| |
Collapse
|
38
|
Pourashraf T, Shokri S, Yousefi M, Ahmadi A, Azar PA. Implementing Machine Learning in Laboratory Synthesis by Hybrid of SVR Model and Optimization Algorithms. ADVANCED THEORY AND SIMULATIONS 2021. [DOI: 10.1002/adts.202100225] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Tolou Pourashraf
- Department of Chemistry Science and Research Branch Islamic Azad University Tehran 1477893855 Iran
| | - Saeid Shokri
- Technology and Innovation Group Research Institute of Petroleum Industry (RIPI) Tehran 1485733111 Iran
| | - Mohammad Yousefi
- Department of Chemistry Faculty of Pharmaceutical Chemistry Tehran Medical Sciences Islamic Azad University Tehran 1949635881 Iran
| | - Abbas Ahmadi
- Department of Chemistry Faculty of Science Karaj Branch Islamic Azad University Karaj 3149968111 Iran
| | - Parviz Aberoomand Azar
- Department of Chemistry Science and Research Branch Islamic Azad University Tehran 1477893855 Iran
| |
Collapse
|
39
|
Mahala S, Arumugam SM, Kumar S, Singh D, Sharma S, Devi B, Yadav SK, Elumalai S. Sn Doping on Ta
2
O
5
Facilitates Glucose Isomerization for Enriched 5‐Hydroxymethylfurfural Production and its True Response Prediction using a Neural Network Model. ChemCatChem 2021. [DOI: 10.1002/cctc.202101046] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
- Sangeeta Mahala
- Chemical Engineering Division DBT-Center of Innovative and Applied Bioprocessing Mohali Punjab 140306 India
- Department of Chemical Sciences Indian Institute of Science Education and Research Mohali Punjab 140306 India
| | - Senthil M. Arumugam
- Chemical Engineering Division DBT-Center of Innovative and Applied Bioprocessing Mohali Punjab 140306 India
| | - Sandeep Kumar
- Chemical Engineering Division DBT-Center of Innovative and Applied Bioprocessing Mohali Punjab 140306 India
- Dr. SSB University Institute of Chemical Engineering and Technology Panjab University Chandigarh 160014 India
| | - Dalwinder Singh
- Computational Biology Division DBT-National Agri-Food Biotechnology Institute Mohali Punjab 140306 India
| | - Shelja Sharma
- Chemical Engineering Division DBT-Center of Innovative and Applied Bioprocessing Mohali Punjab 140306 India
| | - Bhawana Devi
- Chemical Engineering Division DBT-Center of Innovative and Applied Bioprocessing Mohali Punjab 140306 India
- Department of Chemical Sciences Indian Institute of Science Education and Research Mohali Punjab 140306 India
| | - Sudesh K. Yadav
- Biotechnology & Synthetic Biology Division DBT-Center of Innovative and Applied Bioprocessing Mohali Punjab 140306 India
| | - Sasikumar Elumalai
- Chemical Engineering Division DBT-Center of Innovative and Applied Bioprocessing Mohali Punjab 140306 India
| |
Collapse
|
40
|
Lahnsteiner M, Caldera M, Moura HM, Cerrón-Infantes DA, Roeser J, Konegger T, Thomas A, Menche J, Unterlass MM. Hydrothermal polymerization of porous aromatic polyimide networks and machine learning-assisted computational morphology evolution interpretation. JOURNAL OF MATERIALS CHEMISTRY. A 2021; 9:19754-19769. [PMID: 34589226 PMCID: PMC8439099 DOI: 10.1039/d1ta01253c] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 08/18/2021] [Indexed: 06/13/2023]
Abstract
We report on the hydrothermal polymerization (HTP) of polyimide (PI) networks using the medium H2O and the comonomers 1,3,5-tris(4-aminophenyl)benzene (TAPB) and pyromellitic acid (PMA). Full condensation is obtained at minimal reaction times of only 2 h at 200 °C. The PI networks are obtained as monoliths and feature thermal stabilities of >500 °C, and in several cases even up to 595 °C. The monoliths are built up by networks of densely packed, near-monodisperse spherical particles and annealed microfibers, and show three types of porosity: (i) intrinsic inter-segment ultramicroporosity (<0.8 nm) of the PI networks composing the particles (∼3-5 μm), (ii) interstitial voids between the particles (0.1-2 μm), and (iii) monolith cell porosity (∽10-100 μm), as studied via low pressure gas physisorption and Hg intrusion porosimetry analyses. This unique hierarchical porosity generates an outstandingly high specific pore volume of 7250 mm3 g-1. A large-scale micromorphological study screening the reaction parameters time, temperature, and the absence/presence of the additive acetic acid was performed. Through expert interpretation of hundreds of scanning electron microscopy (SEM) images of the products of these experiments, we devise a hypothesis for morphology formation and evolution: a monomer salt is initially formed and subsequently transformed to overall eight different fiber, pearl chain, and spherical morphologies, composed of PI and, at long reaction times (>48 h), also PI/SiO2 hybrids that form through reaction with the reaction vessel. Moreover, we have developed a computational image analysis pipeline that deciphers the complex morphologies of these SEM images automatically and also allows for formulating a hypothesis of morphology development in HTP that is in good agreement with the manual morphology analysis. Finally, we upscaled the HTP of PI(TAPB-PMA) and processed the resulting powder into dense cylindrical specimen by green solvent-free warm-pressing, showing that one can follow the full route from the synthesis of these PI networks to a final material without employing harmful solvents.
Collapse
Affiliation(s)
- Marianne Lahnsteiner
- Technische Universität Wien, Institute of Materials Chemistry Getreidemarkt 9/165 1060 Vienna Austria
- Technische Universität Wien, Institute of Applied Synthetic Chemistry Getreidemarkt 9/163 1060 Vienna Austria
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
| | - Michael Caldera
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
- Max F. Perutz Labs, Campus Vienna Biocenter 5 Dr.-Bohr-Gasse 9 1030 Vienna Austria
| | - Hipassia M Moura
- Technische Universität Wien, Institute of Materials Chemistry Getreidemarkt 9/165 1060 Vienna Austria
- Technische Universität Wien, Institute of Applied Synthetic Chemistry Getreidemarkt 9/163 1060 Vienna Austria
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
- Universität Konstanz, Department of Chemistry, Solid State Chemistry Universitätsstrasse 10 D-78464 Konstanz Germany
| | - D Alonso Cerrón-Infantes
- Technische Universität Wien, Institute of Materials Chemistry Getreidemarkt 9/165 1060 Vienna Austria
- Technische Universität Wien, Institute of Applied Synthetic Chemistry Getreidemarkt 9/163 1060 Vienna Austria
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
- Universität Konstanz, Department of Chemistry, Solid State Chemistry Universitätsstrasse 10 D-78464 Konstanz Germany
| | - Jérôme Roeser
- Technische Universität Berlin, Institute of Chemistry Str. des 17. Juni 115 10623 Berlin Germany
| | - Thomas Konegger
- Technische Universität Wien, Institute of Chemical Technologies and Analytics Getreidemarkt 9/164 1060 Vienna Austria
| | - Arne Thomas
- Technische Universität Berlin, Institute of Chemistry Str. des 17. Juni 115 10623 Berlin Germany
| | - Jörg Menche
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
- Max F. Perutz Labs, Campus Vienna Biocenter 5 Dr.-Bohr-Gasse 9 1030 Vienna Austria
| | - Miriam M Unterlass
- Technische Universität Wien, Institute of Materials Chemistry Getreidemarkt 9/165 1060 Vienna Austria
- Technische Universität Wien, Institute of Applied Synthetic Chemistry Getreidemarkt 9/163 1060 Vienna Austria
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
- Universität Konstanz, Department of Chemistry, Solid State Chemistry Universitätsstrasse 10 D-78464 Konstanz Germany
| |
Collapse
|
41
|
Abstract
Computational methods have emerged as a powerful tool to augment traditional experimental molecular catalyst design by providing useful predictions of catalyst performance and decreasing the time needed for catalyst screening. In this perspective, we discuss three approaches for computational molecular catalyst design: (i) the reaction mechanism-based approach that calculates all relevant elementary steps, finds the rate and selectivity determining steps, and ultimately makes predictions on catalyst performance based on kinetic analysis, (ii) the descriptor-based approach where physical/chemical considerations are used to find molecular properties as predictors of catalyst performance, and (iii) the data-driven approach where statistical analysis as well as machine learning (ML) methods are used to obtain relationships between available data/features and catalyst performance. Following an introduction to these approaches, we cover their strengths and weaknesses and highlight some recent key applications. Furthermore, we present an outlook on how the currently applied approaches may evolve in the near future by addressing how recent developments in building automated computational workflows and implementing advanced ML models hold promise for reducing human workload, eliminating human bias, and speeding up computational catalyst design at the same time. Finally, we provide our viewpoint on how some of the challenges associated with the up-and-coming approaches driven by automation and ML may be resolved.
Collapse
Affiliation(s)
- Ademola Soyemi
- Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL 35487, USA.
| | - Tibor Szilvási
- Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL 35487, USA.
| |
Collapse
|
42
|
Kulichenko M, Smith JS, Nebgen B, Li YW, Fedik N, Boldyrev AI, Lubbers N, Barros K, Tretiak S. The Rise of Neural Networks for Materials and Chemical Dynamics. J Phys Chem Lett 2021; 12:6227-6243. [PMID: 34196559 DOI: 10.1021/acs.jpclett.1c01357] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Machine learning (ML) is quickly becoming a premier tool for modeling chemical processes and materials. ML-based force fields, trained on large data sets of high-quality electron structure calculations, are particularly attractive due their unique combination of computational efficiency and physical accuracy. This Perspective summarizes some recent advances in the development of neural network-based interatomic potentials. Designing high-quality training data sets is crucial to overall model accuracy. One strategy is active learning, in which new data are automatically collected for atomic configurations that produce large ML uncertainties. Another strategy is to use the highest levels of quantum theory possible. Transfer learning allows training to a data set of mixed fidelity. A model initially trained to a large data set of density functional theory calculations can be significantly improved by retraining to a relatively small data set of expensive coupled cluster theory calculations. These advances are exemplified by applications to molecules and materials.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Alexander I Boldyrev
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
43
|
Alshehri AS, You F. Paradigm Shift: The Promise of Deep Learning in Molecular Systems Engineering and Design. FRONTIERS IN CHEMICAL ENGINEERING 2021. [DOI: 10.3389/fceng.2021.700717] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The application of deep learning to a diverse array of research problems has accelerated progress across many fields, bringing conventional paradigms to a new intelligent era. Just as the roles of instrumentation in the old chemical revolutions, we reinforce the necessity for integrating deep learning in molecular systems engineering and design as a transformative catalyst towards the next chemical revolution. To meet such research needs, we summarize advances and progress across several key elements of molecular systems: molecular representation, property estimation, representation learning, and synthesis planning. We further spotlight recent advances and promising directions for several deep learning architectures, methods, and optimization platforms. Our perspective is of interest to both computational and experimental researchers as it aims to chart a path forward for cross-disciplinary collaborations on synthesizing knowledge from available chemical data and guiding experimental efforts.
Collapse
|
44
|
Maley SM, Melville J, Yu S, Teynor MS, Carlsen R, Hargis C, Hamilton RS, Grant BO, Ess DH. Machine learning classification of disrotatory IRC and conrotatory non-IRC trajectory motion for cyclopropyl radical ring opening. Phys Chem Chem Phys 2021; 23:12309-12320. [PMID: 34018524 DOI: 10.1039/d1cp00612f] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Quasiclassical trajectory analysis is now a standard tool to analyze non-minimum energy pathway motion of organic reactions. However, due to the large amount of information associated with trajectories, quantitative analysis of the dynamic origin of reaction selectivity is complex. For the electrocyclic ring opening of cyclopropyl radical, more than 4000 trajectories were run showing that allyl radicals are formed through a mixture of disrotatory intrinsic reaction coordinate (IRC) motion as well as conrotatory non-IRC motion. Geometric, vibrational mode, and atomic velocity transition-state features from these trajectories were used for supervised machine learning analysis with classification algorithms. Accuracy >80% with a random forest model enabled quantitative and qualitative assessment of transition-state trajectory features controlling disrotatory IRC versus conrotatory non-IRC motion. This analysis revealed that there are two key vibrational modes where their directional combination provides prediction of IRC versus non-IRC motion.
Collapse
Affiliation(s)
- Steven M Maley
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, USA.
| | - Jesse Melville
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, USA.
| | - Spencer Yu
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, USA.
| | - Matthew S Teynor
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, USA.
| | - Ryan Carlsen
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, USA.
| | - Cal Hargis
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, USA.
| | - R Spencer Hamilton
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, USA.
| | - Benjamin O Grant
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, USA.
| | - Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, USA.
| |
Collapse
|
45
|
McCarver GA, Rajeshkumar T, Vogiatzis KD. Computational catalysis for metal-organic frameworks: An overview. Coord Chem Rev 2021. [DOI: 10.1016/j.ccr.2021.213777] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
46
|
Moreira-Filho JT, Silva AC, Dantas RF, Gomes BF, Souza Neto LR, Brandao-Neto J, Owens RJ, Furnham N, Neves BJ, Silva-Junior FP, Andrade CH. Schistosomiasis Drug Discovery in the Era of Automation and Artificial Intelligence. Front Immunol 2021; 12:642383. [PMID: 34135888 PMCID: PMC8203334 DOI: 10.3389/fimmu.2021.642383] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/30/2021] [Indexed: 12/20/2022] Open
Abstract
Schistosomiasis is a parasitic disease caused by trematode worms of the genus Schistosoma and affects over 200 million people worldwide. The control and treatment of this neglected tropical disease is based on a single drug, praziquantel, which raises concerns about the development of drug resistance. This, and the lack of efficacy of praziquantel against juvenile worms, highlights the urgency for new antischistosomal therapies. In this review we focus on innovative approaches to the identification of antischistosomal drug candidates, including the use of automated assays, fragment-based screening, computer-aided and artificial intelligence-based computational methods. We highlight the current developments that may contribute to optimizing research outputs and lead to more effective drugs for this highly prevalent disease, in a more cost-effective drug discovery endeavor.
Collapse
Affiliation(s)
- José T. Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Arthur C. Silva
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Rafael F. Dantas
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Barbara F. Gomes
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Lauro R. Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Jose Brandao-Neto
- Diamond Light Source Ltd., Didcot, United Kingdom
- Research Complex at Harwell, Didcot, United Kingdom
| | - Raymond J. Owens
- The Rosalind Franklin Institute, Harwell, United Kingdom
- Division of Structural Biology, The Wellcome Centre for Human Genetic, University of Oxford, Oxford, United Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Bruno J. Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Floriano P. Silva-Junior
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Carolina H. Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| |
Collapse
|
47
|
Cruzeiro VWD, Lambros E, Riera M, Roy R, Paesani F, Götz AW. Highly Accurate Many-Body Potentials for Simulations of N 2O 5 in Water: Benchmarks, Development, and Validation. J Chem Theory Comput 2021; 17:3931-3945. [PMID: 34029079 DOI: 10.1021/acs.jctc.1c00069] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Dinitrogen pentoxide (N2O5) is an important intermediate in the atmospheric chemistry of nitrogen oxides. Although there has been much research, the processes that govern the physical interactions between N2O5 and water are still not fully understood at a molecular level. Gaining a quantitative insight from computer simulations requires going beyond the accuracy of classical force fields while accessing length scales and time scales that are out of reach for high-level quantum-chemical approaches. To this end, we present the development of MB-nrg many-body potential energy functions for nonreactive simulations of N2O5 in water. This MB-nrg model is based on electronic structure calculations at the coupled cluster level of theory and is compatible with the successful MB-pol model for water. It provides a physically correct description of long-range many-body interactions in combination with an explicit representation of up to three-body short-range interactions in terms of multidimensional permutationally invariant polynomials. In order to further investigate the importance of the underlying interactions in the model, a TTM-nrg model was also devised. TTM-nrg is a more simplistic representation that contains only two-body short-range interactions represented through Born-Mayer functions. In this work, an active learning approach was employed to efficiently build representative training sets of monomer, dimer, and trimer structures, and benchmarks are presented to determine the accuracy of our new models in comparison to a range of density functional theory methods. By assessing the binding curves, distortion energies of N2O5, and interaction energies in clusters of N2O5 and water, we evaluate the importance of two-body and three-body short-range potentials. The results demonstrate that our MB-nrg model has high accuracy with respect to the coupled cluster reference, outperforms current density functional theory models, and thus enables highly accurate simulations of N2O5 in aqueous environments.
Collapse
Affiliation(s)
- Vinícius Wilian D Cruzeiro
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, United States.,Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Eleftherios Lambros
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Marc Riera
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Ronak Roy
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, United States
| | - Francesco Paesani
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, United States.,Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States.,Materials Science and Engineering, University of California San Diego, La Jolla, California 92093, United States
| | - Andreas W Götz
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, United States
| |
Collapse
|
48
|
Abstract
Introduction: Artificial Intelligence (AI) has become a component of our everyday lives, with applications ranging from recommendations on what to buy to the analysis of radiology images. Many of the techniques originally developed for other fields such as language translation and computer vision are now being applied in drug discovery. AI has enabled multiple aspects of drug discovery including the analysis of high content screening data, and the design and synthesis of new molecules.Areas covered: This perspective provides an overview of the application of AI in several areas relevant to drug discovery including property prediction, molecule generation, image analysis, and organic synthesis planning.Expert opinion: While a variety of machine learning methods are now being routinely used to predict biological activity and ADME properties, methods of representing molecules continue to evolve. Molecule generation methods are relatively new and unproven but hold the potential to access new, unexplored areas of chemical space. The application of AI in drug discovery will continue to benefit from dedicated research, as well as AI developments in other fields. With this pairing algorithmic advancements and high-quality data, the impact of AI in drug discovery will continue to grow in the coming years.
Collapse
Affiliation(s)
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| |
Collapse
|
49
|
Hastings J, Glauer M, Memariani A, Neuhaus F, Mossakowski T. Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification. J Cheminform 2021; 13:23. [PMID: 33726837 PMCID: PMC7962259 DOI: 10.1186/s13321-021-00500-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 02/26/2021] [Indexed: 12/22/2022] Open
Abstract
Chemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.
Collapse
Affiliation(s)
- Janna Hastings
- Department of Computer Science, Otto-von-Guericke University of Magdeburg, Magdeburg, Germany
| | - Martin Glauer
- Department of Computer Science, Otto-von-Guericke University of Magdeburg, Magdeburg, Germany
| | - Adel Memariani
- Department of Computer Science, Otto-von-Guericke University of Magdeburg, Magdeburg, Germany
| | - Fabian Neuhaus
- Department of Computer Science, Otto-von-Guericke University of Magdeburg, Magdeburg, Germany
| | - Till Mossakowski
- Department of Computer Science, Otto-von-Guericke University of Magdeburg, Magdeburg, Germany
| |
Collapse
|
50
|
Piras A, Ehlert C, Gryn'ova G. Sensing and sensitivity: Computational chemistry of
graphene‐based
sensors. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1526] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Anna Piras
- Heidelberg Institute for Theoretical Studies (HITS gGmbH) and Interdisciplinary Center for Scientific Computing (IWR) Heidelberg University Heidelberg Germany
| | - Christopher Ehlert
- Heidelberg Institute for Theoretical Studies (HITS gGmbH) and Interdisciplinary Center for Scientific Computing (IWR) Heidelberg University Heidelberg Germany
| | - Ganna Gryn'ova
- Heidelberg Institute for Theoretical Studies (HITS gGmbH) and Interdisciplinary Center for Scientific Computing (IWR) Heidelberg University Heidelberg Germany
| |
Collapse
|