51
|
Lane TR, Urbina F, Rank L, Gerlach J, Riabova O, Lepioshkin A, Kazakova E, Vocat A, Tkachenko V, Cole S, Makarov V, Ekins S. Machine Learning Models for Mycobacterium tuberculosisIn Vitro Activity: Prediction and Target Visualization. Mol Pharm 2022; 19:674-689. [PMID: 34964633 PMCID: PMC9121329 DOI: 10.1021/acs.molpharmaceut.1c00791] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Tuberculosis (TB) is a major global health challenge, with approximately 1.4 million deaths per year. There is still a need to develop novel treatments for patients infected with Mycobacterium tuberculosis (Mtb). There have been many large-scale phenotypic screens that have led to the identification of thousands of new compounds. Yet, there is very limited investment in TB drug discovery which points to the need for new methods to increase the efficiency of drug discovery against Mtb. We have used machine learning approaches to learn from the public Mtb data, resulting in many data sets and models with robust enrichment and hit rates leading to the discovery of new active compounds. Recently, we have curated predominantly small-molecule Mtb data and developed new machine learning classification models with 18 886 molecules at different activity cutoffs. We now describe the further validation of these Bayesian models using a library of over 1000 molecules synthesized as part of EU-funded New Medicines for TB and More Medicines for TB programs. We highlight molecular features which are enriched in these active compounds. In addition, we provide new regression and classification models that can be used for scoring compound libraries or used to design new molecules. We have also visualized these molecules in the context of known molecular targets and identified clusters in chemical property space, which may aid in future target identification efforts. Finally, we are also making these data sets publicly available, representing a significant increase to the available Mtb inhibition data in the public domain.
Collapse
Affiliation(s)
- Thomas R. Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Laura Rank
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Jacob Gerlach
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Olga Riabova
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | | | - Elena Kazakova
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | - Anthony Vocat
- Global Health Institute, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Valery Tkachenko
- Science Data Experts, 14909 Forest Landing Cir, Rockville, MD 20850
| | | | - Vadim Makarov
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| |
Collapse
|
52
|
Smart Materials Prediction: Applying Machine Learning to Lithium Solid-State Electrolyte. MATERIALS 2022; 15:ma15031157. [PMID: 35161101 PMCID: PMC8840428 DOI: 10.3390/ma15031157] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/23/2022] [Accepted: 01/31/2022] [Indexed: 11/24/2022]
Abstract
Traditionally, the discovery of new materials has often depended on scholars’ computational and experimental experience. The traditional trial-and-error methods require many resources and computing time. Due to new materials’ properties becoming more complex, it is difficult to predict and identify new materials only by general knowledge and experience. Material prediction tools based on machine learning (ML) have been successfully applied to various materials fields; they are beneficial for modeling and accelerating the prediction process for materials that cannot be accurately predicted. However, the obstacles of disciplinary span led to many scholars in materials not having complete knowledge of data-driven materials science methods. This paper provides an overview of the general process of ML applied to materials prediction and uses solid-state electrolytes (SSE) as an example. Recent approaches and specific applications to ML in the materials field and the requirements for building ML models for predicting lithium SSE are reviewed. Finally, some current obstacles to applying ML in materials prediction and prospects are described with the expectation that more materials scholars will be aware of the application of ML in materials prediction.
Collapse
|
53
|
Jung K, Corrigan N, Wong EHH, Boyer C. Bioactive Synthetic Polymers. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2022; 34:e2105063. [PMID: 34611948 DOI: 10.1002/adma.202105063] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/13/2021] [Indexed: 05/21/2023]
Abstract
Synthetic polymers are omnipresent in society as textiles and packaging materials, in construction and medicine, among many other important applications. Alternatively, natural polymers play a crucial role in sustaining life and allowing organisms to adapt to their environments by performing key biological functions such as molecular recognition and transmission of genetic information. In general, the synthetic and natural polymer worlds are completely separated due to the inability for synthetic polymers to perform specific biological functions; in some cases, synthetic polymers cause uncontrolled and unwanted biological responses. However, owing to the advancement of synthetic polymerization techniques in recent years, new synthetic polymers have emerged that provide specific biological functions such as targeted molecular recognition of peptides, or present antiviral, anticancer, and antimicrobial activities. In this review, the emergence of this generation of bioactive synthetic polymers and their bioapplications are summarized. Finally, the future opportunities in this area are discussed.
Collapse
Affiliation(s)
- Kenward Jung
- Cluster for Advanced Macromolecular Design (CAMD), Australian Centre for Nanomedicine (ACN), and School of Chemical Engineering, University of New South Wales (UNSW) Sydney, Sydney, NSW, 2052, Australia
| | - Nathaniel Corrigan
- Cluster for Advanced Macromolecular Design (CAMD), Australian Centre for Nanomedicine (ACN), and School of Chemical Engineering, University of New South Wales (UNSW) Sydney, Sydney, NSW, 2052, Australia
| | - Edgar H H Wong
- Cluster for Advanced Macromolecular Design (CAMD), Australian Centre for Nanomedicine (ACN), and School of Chemical Engineering, University of New South Wales (UNSW) Sydney, Sydney, NSW, 2052, Australia
| | - Cyrille Boyer
- Cluster for Advanced Macromolecular Design (CAMD), Australian Centre for Nanomedicine (ACN), and School of Chemical Engineering, University of New South Wales (UNSW) Sydney, Sydney, NSW, 2052, Australia
| |
Collapse
|
54
|
Unsupervised Representation Learning for Proteochemometric Modeling. Int J Mol Sci 2021; 22:ijms222312882. [PMID: 34884688 PMCID: PMC8657702 DOI: 10.3390/ijms222312882] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/25/2021] [Accepted: 11/26/2021] [Indexed: 11/18/2022] Open
Abstract
In silico protein–ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to create an accurate model of the protein–ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous studies in PCM modeling rely on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings that outperform complex, human-engineered representations. Several different embedding methods for proteins and molecules have been developed based on various language-modeling methods. Here, we demonstrate the utility of these unsupervised representations and compare three protein embeddings and two compound embeddings in a fair manner. We evaluate performance on various splits of a benchmark dataset, as well as on an internal dataset of protein–ligand binding activities and find that unsupervised-learned representations significantly outperform handcrafted representations.
Collapse
|
55
|
Hueffel JA, Sperger T, Funes-Ardoiz I, Ward JS, Rissanen K, Schoenebeck F. Accelerated dinuclear palladium catalyst identification through unsupervised machine learning. Science 2021; 374:1134-1140. [PMID: 34822285 DOI: 10.1126/science.abj0999] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
[Figure: see text].
Collapse
Affiliation(s)
- Julian A Hueffel
- Institute of Organic Chemistry, RWTH Aachen University; Landoltweg 1, 52074 Aachen, Germany
| | - Theresa Sperger
- Institute of Organic Chemistry, RWTH Aachen University; Landoltweg 1, 52074 Aachen, Germany
| | - Ignacio Funes-Ardoiz
- Institute of Organic Chemistry, RWTH Aachen University; Landoltweg 1, 52074 Aachen, Germany
| | - Jas S Ward
- Department of Chemistry, University of Jyväskylä; P.O. Box 35, 40014 Jyväskylä, Finland
| | - Kari Rissanen
- Department of Chemistry, University of Jyväskylä; P.O. Box 35, 40014 Jyväskylä, Finland
| | - Franziska Schoenebeck
- Institute of Organic Chemistry, RWTH Aachen University; Landoltweg 1, 52074 Aachen, Germany
| |
Collapse
|
56
|
Yang Y, Yao K, Repasky MP, Leswing K, Abel R, Shoichet BK, Jerome SV. Efficient Exploration of Chemical Space with Docking and Deep Learning. J Chem Theory Comput 2021; 17:7106-7119. [PMID: 34592101 DOI: 10.1021/acs.jctc.1c00810] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.
Collapse
Affiliation(s)
- Ying Yang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Kun Yao
- Schrödinger, Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States
| | - Matthew P Repasky
- Schrödinger, Inc., 101 SW Main Street, #1300, Portland, Oregon 97239, United States
| | - Karl Leswing
- Schrödinger, Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States
| | - Robert Abel
- Schrödinger, Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States
| | - Brian K Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Steven V Jerome
- Schrödinger, Inc., 10201 Wateridge Cir Suite 220, San Diego, California 92121, United States
| |
Collapse
|
57
|
Machine Learning Applied to the Modeling of Pharmacological and ADMET Endpoints. Methods Mol Biol 2021. [PMID: 34731464 DOI: 10.1007/978-1-0716-1787-8_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
The well-known concept of quantitative structure-activity relationships (QSAR) has been gaining significant interest in the recent years. Data, descriptors, and algorithms are the main pillars to build useful models that support more efficient drug discovery processes with in silico methods. Significant advances in all three areas are the reason for the regained interest in these models. In this book chapter we review various machine learning (ML) approaches that make use of measured in vitro/in vivo data of many compounds. We put these in context with other digital drug discovery methods and present some application examples.
Collapse
|
58
|
Williams W, Zeng L, Gensch T, Sigman MS, Doyle AG, Anslyn EV. The Evolution of Data-Driven Modeling in Organic Chemistry. ACS CENTRAL SCIENCE 2021; 7:1622-1637. [PMID: 34729406 PMCID: PMC8554870 DOI: 10.1021/acscentsci.1c00535] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Indexed: 05/14/2023]
Abstract
Organic chemistry is replete with complex relationships: for example, how a reactant's structure relates to the resulting product formed; how reaction conditions relate to yield; how a catalyst's structure relates to enantioselectivity. Questions like these are at the foundation of understanding reactivity and developing novel and improved reactions. An approach to probing these questions that is both longstanding and contemporary is data-driven modeling. Here, we provide a synopsis of the history of data-driven modeling in organic chemistry and the terms used to describe these endeavors. We include a timeline of the steps that led to its current state. The case studies included highlight how, as a community, we have advanced physical organic chemistry tools with the aid of computers and data to augment the intuition of expert chemists and to facilitate the prediction of structure-activity and structure-property relationships.
Collapse
Affiliation(s)
- Wendy
L. Williams
- Department
of Chemistry and Biochemistry, University
of California, Los Angeles, California 90095, United States
- Department
of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Lingyu Zeng
- Department
of Chemistry, The University of Texas at
Austin, Austin, Texas 78712, United States
| | - Tobias Gensch
- Department
of Chemistry, TU Berlin, Straße des 17. Juni 135, Sekr. C2, 10623 Berlin, Germany
| | - Matthew S. Sigman
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Abigail G. Doyle
- Department
of Chemistry and Biochemistry, University
of California, Los Angeles, California 90095, United States
- Department
of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Eric V. Anslyn
- Department
of Chemistry, The University of Texas at
Austin, Austin, Texas 78712, United States
| |
Collapse
|
59
|
Haywood AL, Redshaw J, Hanson-Heine MWD, Taylor A, Brown A, Mason AM, Gärtner T, Hirst JD. Kernel Methods for Predicting Yields of Chemical Reactions. J Chem Inf Model 2021; 62:2077-2092. [PMID: 34699222 DOI: 10.1021/acs.jcim.1c00699] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The use of machine learning methods for the prediction of reaction yield is an emerging area. We demonstrate the applicability of support vector regression (SVR) for predicting reaction yields, using combinatorial data. Molecular descriptors used in regression tasks related to chemical reactivity have often been based on time-consuming, computationally demanding quantum chemical calculations, usually density functional theory. Structure-based descriptors (molecular fingerprints and molecular graphs) are quicker and easier to calculate and are applicable to any molecule. In this study, SVR models built on structure-based descriptors were compared to models built on quantum chemical descriptors. The models were evaluated along the dimension of each reaction component in a set of Buchwald-Hartwig amination reactions. The structure-based SVR models outperformed the quantum chemical SVR models, along the dimension of each reaction component. The applicability of the models was assessed with respect to similarity to training. Prospective predictions of unseen Buchwald-Hartwig reactions are presented for synthetic assessment, to validate the generalizability of the models, with particular interest along the aryl halide dimension.
Collapse
Affiliation(s)
- Alexe L Haywood
- School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, U.K
| | - Joseph Redshaw
- School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, U.K
| | | | - Adam Taylor
- GlaxoSmithKline, Gunnels Wood Road, Stevenage SG1 2NY, U.K
| | - Alex Brown
- GlaxoSmithKline, Gunnels Wood Road, Stevenage SG1 2NY, U.K
| | - Andrew M Mason
- GlaxoSmithKline, Gunnels Wood Road, Stevenage SG1 2NY, U.K
| | - Thomas Gärtner
- Machine Learning Research Unit, TU Wien Informatics, Vienna 1040, Austria
| | - Jonathan D Hirst
- School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, U.K
| |
Collapse
|
60
|
Chakraborty P, Mandal R, Garg N, Sundararaju B. Recent advances in transition metal-catalyzed asymmetric electrocatalysis. Coord Chem Rev 2021. [DOI: 10.1016/j.ccr.2021.214065] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
61
|
Lahnsteiner M, Caldera M, Moura HM, Cerrón-Infantes DA, Roeser J, Konegger T, Thomas A, Menche J, Unterlass MM. Hydrothermal polymerization of porous aromatic polyimide networks and machine learning-assisted computational morphology evolution interpretation. JOURNAL OF MATERIALS CHEMISTRY. A 2021; 9:19754-19769. [PMID: 34589226 PMCID: PMC8439099 DOI: 10.1039/d1ta01253c] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 08/18/2021] [Indexed: 06/13/2023]
Abstract
We report on the hydrothermal polymerization (HTP) of polyimide (PI) networks using the medium H2O and the comonomers 1,3,5-tris(4-aminophenyl)benzene (TAPB) and pyromellitic acid (PMA). Full condensation is obtained at minimal reaction times of only 2 h at 200 °C. The PI networks are obtained as monoliths and feature thermal stabilities of >500 °C, and in several cases even up to 595 °C. The monoliths are built up by networks of densely packed, near-monodisperse spherical particles and annealed microfibers, and show three types of porosity: (i) intrinsic inter-segment ultramicroporosity (<0.8 nm) of the PI networks composing the particles (∼3-5 μm), (ii) interstitial voids between the particles (0.1-2 μm), and (iii) monolith cell porosity (∽10-100 μm), as studied via low pressure gas physisorption and Hg intrusion porosimetry analyses. This unique hierarchical porosity generates an outstandingly high specific pore volume of 7250 mm3 g-1. A large-scale micromorphological study screening the reaction parameters time, temperature, and the absence/presence of the additive acetic acid was performed. Through expert interpretation of hundreds of scanning electron microscopy (SEM) images of the products of these experiments, we devise a hypothesis for morphology formation and evolution: a monomer salt is initially formed and subsequently transformed to overall eight different fiber, pearl chain, and spherical morphologies, composed of PI and, at long reaction times (>48 h), also PI/SiO2 hybrids that form through reaction with the reaction vessel. Moreover, we have developed a computational image analysis pipeline that deciphers the complex morphologies of these SEM images automatically and also allows for formulating a hypothesis of morphology development in HTP that is in good agreement with the manual morphology analysis. Finally, we upscaled the HTP of PI(TAPB-PMA) and processed the resulting powder into dense cylindrical specimen by green solvent-free warm-pressing, showing that one can follow the full route from the synthesis of these PI networks to a final material without employing harmful solvents.
Collapse
Affiliation(s)
- Marianne Lahnsteiner
- Technische Universität Wien, Institute of Materials Chemistry Getreidemarkt 9/165 1060 Vienna Austria
- Technische Universität Wien, Institute of Applied Synthetic Chemistry Getreidemarkt 9/163 1060 Vienna Austria
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
| | - Michael Caldera
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
- Max F. Perutz Labs, Campus Vienna Biocenter 5 Dr.-Bohr-Gasse 9 1030 Vienna Austria
| | - Hipassia M Moura
- Technische Universität Wien, Institute of Materials Chemistry Getreidemarkt 9/165 1060 Vienna Austria
- Technische Universität Wien, Institute of Applied Synthetic Chemistry Getreidemarkt 9/163 1060 Vienna Austria
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
- Universität Konstanz, Department of Chemistry, Solid State Chemistry Universitätsstrasse 10 D-78464 Konstanz Germany
| | - D Alonso Cerrón-Infantes
- Technische Universität Wien, Institute of Materials Chemistry Getreidemarkt 9/165 1060 Vienna Austria
- Technische Universität Wien, Institute of Applied Synthetic Chemistry Getreidemarkt 9/163 1060 Vienna Austria
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
- Universität Konstanz, Department of Chemistry, Solid State Chemistry Universitätsstrasse 10 D-78464 Konstanz Germany
| | - Jérôme Roeser
- Technische Universität Berlin, Institute of Chemistry Str. des 17. Juni 115 10623 Berlin Germany
| | - Thomas Konegger
- Technische Universität Wien, Institute of Chemical Technologies and Analytics Getreidemarkt 9/164 1060 Vienna Austria
| | - Arne Thomas
- Technische Universität Berlin, Institute of Chemistry Str. des 17. Juni 115 10623 Berlin Germany
| | - Jörg Menche
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
- Max F. Perutz Labs, Campus Vienna Biocenter 5 Dr.-Bohr-Gasse 9 1030 Vienna Austria
| | - Miriam M Unterlass
- Technische Universität Wien, Institute of Materials Chemistry Getreidemarkt 9/165 1060 Vienna Austria
- Technische Universität Wien, Institute of Applied Synthetic Chemistry Getreidemarkt 9/163 1060 Vienna Austria
- CeMM - Research Center for Molecular Medicine of the Austrian Academy of Sciences Lazarettgasse 14, AKH BT 25.3 1090 Vienna Austria
- Universität Konstanz, Department of Chemistry, Solid State Chemistry Universitätsstrasse 10 D-78464 Konstanz Germany
| |
Collapse
|
62
|
Keith JA, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller KR, Tkatchenko A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem Rev 2021; 121:9816-9872. [PMID: 34232033 PMCID: PMC8391798 DOI: 10.1021/acs.chemrev.1c00107] [Citation(s) in RCA: 190] [Impact Index Per Article: 63.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Indexed: 12/23/2022]
Abstract
Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.
Collapse
Affiliation(s)
- John A. Keith
- Department
of Chemical and Petroleum Engineering Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Valentin Vassilev-Galindo
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Bingqing Cheng
- Accelerate
Programme for Scientific Discovery, Department
of Computer Science and Technology, 15 J. J. Thomson Avenue, Cambridge CB3 0FD, United Kingdom
| | - Stefan Chmiela
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany
- Google Research, Brain Team, 10117 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
63
|
Machine Learning in Chemical Product Engineering: The State of the Art and a Guide for Newcomers. Processes (Basel) 2021. [DOI: 10.3390/pr9081456] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Chemical Product Engineering (CPE) is marked by numerous challenges, such as the complexity of the properties–structure–ingredients–process relationship of the different products and the necessity to discover and develop constantly and quickly new molecules and materials with tailor-made properties. In recent years, artificial intelligence (AI) and machine learning (ML) methods have gained increasing attention due to their performance in tackling particularly complex problems in various areas, such as computer vision and natural language processing. As such, they present a specific interest in addressing the complex challenges of CPE. This article provides an updated review of the state of the art regarding the implementation of ML techniques in different types of CPE problems with a particular focus on four specific domains, namely the design and discovery of new molecules and materials, the modeling of processes, the prediction of chemical reactions/retrosynthesis and the support for sensorial analysis. This review is further completed by general guidelines for the selection of an appropriate ML technique given the characteristics of each problem and by a critical discussion of several key issues associated with the development of ML modeling approaches. Accordingly, this paper may serve both the experienced researcher in the field as well as the newcomer.
Collapse
|
64
|
Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, Chen X, Hou T, Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1567] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Guoli Xiong
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Dejun Jiang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Shao Liu
- Department of Pharmacy Xiangya Hospital, Central South University Changsha China
| | - Aiping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis Xiangya Hospital, Central South University Changsha China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| |
Collapse
|
65
|
Kerner J, Dogan A, von Recum H. Machine learning and big data provide crucial insight for future biomaterials discovery and research. Acta Biomater 2021; 130:54-65. [PMID: 34087445 DOI: 10.1016/j.actbio.2021.05.053] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 05/25/2021] [Indexed: 02/06/2023]
Abstract
Machine learning have been widely adopted in a variety of fields including engineering, science, and medicine revolutionizing how data is collected, used, and stored. Their implementation has led to a drastic increase in the number of computational models for the prediction of various numerical, categorical, or association events given input variables. We aim to examine recent advances in the use of machine learning when applied to the biomaterial field. Specifically, quantitative structure properties relationships offer the unique ability to correlate microscale molecular descriptors to larger macroscale material properties. These new models can be broken down further into four categories: regression, classification, association, and clustering. We examine recent approaches and new uses of machine learning in the three major categories of biomaterials: metals, polymers, and ceramics for rapid property prediction and trend identification. While current research is promising, limitations in the form of lack of standardized reporting and available databases complicates the implementation of described models. Herein, we hope to provide a snapshot of the current state of the field and a beginner's guide to navigating the intersection of biomaterials research and machine learning. STATEMENT OF SIGNIFICANCE: Machine learning and its methods have found a variety of uses beyond the field of computer science but have largely been neglected by those in realm of biomaterials. Through the use of more computational methods, biomaterials development can be expediated while reducing the need for standard trial and error methods. Within, we introduce four basic models that readers can potentially apply to their current research as well as current applications within the field. Furthermore, we hope that this article may act as a "call to action" for readers to realize and address the current lack of implementation within the biomaterials field.
Collapse
Affiliation(s)
- Jacob Kerner
- Case Western Reserve University; 10900 Euclid Ave., Cleveland Ohio 44106.
| | - Alan Dogan
- Case Western Reserve University; 10900 Euclid Ave., Cleveland Ohio 44106.
| | - Horst von Recum
- Case Western Reserve University; 10900 Euclid Ave., Cleveland Ohio 44106.
| |
Collapse
|
66
|
Liu Y, Zhou Q, Cui G. Machine Learning Boosting the Development of Advanced Lithium Batteries. SMALL METHODS 2021; 5:e2100442. [PMID: 34927866 DOI: 10.1002/smtd.202100442] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 06/22/2021] [Indexed: 06/14/2023]
Abstract
Lithium batteries (LBs) have many high demands regarding their application in portable electronic devices, electric vehicles, and smart grids. Machine learning (ML) can effectively accelerate the discovery of materials and predict their performances for LBs, which is thus able to markedly enhance the development of advanced LBs. In recent years, there have been many successful examples of using ML for advanced LBs. In this review, the basic procedure and representative methods of ML are briefly introduced to promote understanding of ML by experts in LBs. Then, the application of ML in developing LBs is highlighted for the purpose of attracting more attention to this field. Finally, the challenges and perspectives of ML are noted for the further development of LBs. It is hoped that this review can shed light on the application of ML in developing LBs and boost the development of advanced LBs.
Collapse
Affiliation(s)
- Yangting Liu
- First Institute of Oceanography, Ministry of Natural Resources, No. 6 Xianxialing Road, Qingdao, 266061, China
| | - Qian Zhou
- Qingdao Industrial Energy Storage Research Institute, Qingdao Institute of Bioenergy and Bioprocess Technology Chinese Academy of Sciences, No. 189 Songling Road, Qingdao, 266101, China
| | - Guanglei Cui
- Qingdao Industrial Energy Storage Research Institute, Qingdao Institute of Bioenergy and Bioprocess Technology Chinese Academy of Sciences, No. 189 Songling Road, Qingdao, 266101, China
| |
Collapse
|
67
|
Tan Z, Li Y, Shi W, Yang S. A Multitask Approach to Learn Molecular Properties. J Chem Inf Model 2021; 61:3824-3834. [PMID: 34289687 DOI: 10.1021/acs.jcim.1c00646] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The endeavors to pursue a robust multitask model to resolve intertask correlations have lasted for many years. A multitask deep neural network, as the most widely used multitask framework, however, experiences several issues such as inconsistent performance improvement over the independent model benchmark. The research aims to introduce an alternative framework by using the problem transformation methods. We build our multitask models essentially based on the stacking of a base regressor and classifier, where the multitarget predictions are realized from an additional training stage on the expanded molecular feature space. The model architecture is implemented on the QM9, Alchemy, and Tox21 datasets, by using a variety of baseline machine learning techniques. The resultant multitask performance shows 1 to 10% enhancement of forecasting precision, with the task prediction accuracy being consistently improved over the independent single-target models. The proposed method demonstrates a notable superiority in tackling the intertarget dependence and, moreover, a great potential to simulate a wide range of molecular properties under the transformation framework.
Collapse
Affiliation(s)
- Zheng Tan
- Chengdu Polytechnic, 83 Tianyi Street, Chengdu, Sichuan 610000, P. R. China
| | - Yan Li
- Xiyuan Quantitative Technology, 388 Yizhou Road, Chengdu, Sichuan 610000, P. R. China
| | - Weimei Shi
- Chengdu Polytechnic, 83 Tianyi Street, Chengdu, Sichuan 610000, P. R. China
| | - Shiqing Yang
- Chengdu Polytechnic, 83 Tianyi Street, Chengdu, Sichuan 610000, P. R. China
| |
Collapse
|
68
|
Fritz F, Preissner R, Banerjee P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 2021; 49:W679-W684. [PMID: 33905509 PMCID: PMC8262722 DOI: 10.1093/nar/gkab292] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 04/07/2021] [Accepted: 04/09/2021] [Indexed: 12/30/2022] Open
Abstract
Taste is one of the crucial organoleptic properties involved in the perception of food by humans. Taste of a chemical compound present in food stimulates us to take in food and avoid poisons. Bitter taste of drugs presents compliance problems and early flagging of potential bitterness of a drug candidate may help with its further development. Similarly, the taste of chemicals present in food is important for evaluation of food quality in the industry. In this work, we have implemented machine learning models to predict three different taste endpoints-sweet, bitter and sour. The VirtualTaste models achieved an overall accuracy of 90% and an AUC of 0.98 in 10-fold cross-validation and in an independent test set. The web server takes a two-dimensional chemical structure as input and reports the chemical's taste profile for three tastes-using molecular fingerprints along with confidence scores, including information on similar compounds with known activity from the training set and an overall radar chart. Additionally, insights into 25 bitter receptors are also provided via target prediction for the predicted bitter compounds. VirtualTaste, to the best of our knowledge, is the first freely available web-based platform for the prediction of three different tastes of compounds. It is accessible via http://virtualtaste.charite.de/VirtualTaste/without any login requirements and is free to use.
Collapse
Affiliation(s)
- Franziska Fritz
- Institute of Physiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Philippstrasse 12, 10115, Berlin, Germany
| | - Robert Preissner
- Institute of Physiology and Science-IT, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Philippstrasse 12, 10115, Berlin, Germany
| | - Priyanka Banerjee
- Institute of Physiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Philippstrasse 12, 10115, Berlin, Germany
| |
Collapse
|
69
|
Kashyap K, Siddiqi MI. Recent trends in artificial intelligence-driven identification and development of anti-neurodegenerative therapeutic agents. Mol Divers 2021; 25:1517-1539. [PMID: 34282519 DOI: 10.1007/s11030-021-10274-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 07/05/2021] [Indexed: 12/12/2022]
Abstract
Neurological disorders affect various aspects of life. Finding drugs for the central nervous system is a very challenging and complex task due to the involvement of the blood-brain barrier, P-glycoprotein, and the drug's high attrition rates. The availability of big data present in online databases and resources has enabled the emergence of artificial intelligence techniques including machine learning to analyze, process the data, and predict the unknown data with high efficiency. The use of these modern techniques has revolutionized the whole drug development paradigm, with an unprecedented acceleration in the central nervous system drug discovery programs. Also, the new deep learning architectures proposed in many recent works have given a better understanding of how artificial intelligence can tackle big complex problems that arose due to central nervous system disorders. Therefore, the present review provides comprehensive and up-to-date information on machine learning/artificial intelligence-triggered effort in the brain care domain. In addition, a brief overview is presented on machine learning algorithms and their uses in structure-based drug design, ligand-based drug design, ADMET prediction, de novo drug design, and drug repurposing. Lastly, we conclude by discussing the major challenges and limitations posed and how they can be tackled in the future by using these modern machine learning/artificial intelligence approaches.
Collapse
Affiliation(s)
- Kushagra Kashyap
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India.,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
| | - Mohammad Imran Siddiqi
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India. .,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India.
| |
Collapse
|
70
|
Wiesinger H, Wang Z, Hellweg S. Deep Dive into Plastic Monomers, Additives, and Processing Aids. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:9339-9351. [PMID: 34154322 DOI: 10.1021/acs.est.1c00976] [Citation(s) in RCA: 157] [Impact Index Per Article: 52.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
A variety of chemical substances used in plastic production may be released throughout the entire life cycle of the plastic, posing risks to human health, the environment, and recycling systems. Only a limited number of these substances have been widely studied. We systematically investigate plastic monomers, additives, and processing aids on the global market based on a review of 63 industrial, scientific, and regulatory data sources. In total, we identify more than 10'000 relevant substances and categorize them based on substance types, use patterns, and hazard classifications wherever possible. Over 2'400 substances are identified as substances of potential concern as they meet one or more of the persistence, bioaccumulation, and toxicity criteria in the European Union. Many of these substances are hardly studied according to SciFinder (266 substances), are not adequately regulated in many parts of the world (1'327 substances), or are even approved for use in food-contact plastics in some jurisdictions (901 substances). Substantial information gaps exist in the public domain, particularly on substance properties and use patterns. To transition to a sustainable circular plastic economy that avoids the use of hazardous chemicals, concerted efforts by all stakeholders are needed, starting by increasing information accessibility.
Collapse
Affiliation(s)
- Helene Wiesinger
- Chair of Ecological Systems Design, Institute of Environmental Engineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Zhanyun Wang
- Chair of Ecological Systems Design, Institute of Environmental Engineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Stefanie Hellweg
- Chair of Ecological Systems Design, Institute of Environmental Engineering, ETH Zürich, 8093 Zürich, Switzerland
| |
Collapse
|
71
|
Green AJ, Mohlenkamp MJ, Das J, Chaudhari M, Truong L, Tanguay RL, Reif DM. Leveraging high-throughput screening data, deep neural networks, and conditional generative adversarial networks to advance predictive toxicology. PLoS Comput Biol 2021; 17:e1009135. [PMID: 34214078 PMCID: PMC8301607 DOI: 10.1371/journal.pcbi.1009135] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 07/23/2021] [Accepted: 05/31/2021] [Indexed: 12/01/2022] Open
Abstract
There are currently 85,000 chemicals registered with the Environmental Protection Agency (EPA) under the Toxic Substances Control Act, but only a small fraction have measured toxicological data. To address this gap, high-throughput screening (HTS) and computational methods are vital. As part of one such HTS effort, embryonic zebrafish were used to examine a suite of morphological and mortality endpoints at six concentrations from over 1,000 unique chemicals found in the ToxCast library (phase 1 and 2). We hypothesized that by using a conditional generative adversarial network (cGAN) or deep neural networks (DNN), and leveraging this large set of toxicity data we could efficiently predict toxic outcomes of untested chemicals. Utilizing a novel method in this space, we converted the 3D structural information into a weighted set of points while retaining all information about the structure. In vivo toxicity and chemical data were used to train two neural network generators. The first was a DNN (Go-ZT) while the second utilized cGAN architecture (GAN-ZT) to train generators to produce toxicity data. Our results showed that Go-ZT significantly outperformed the cGAN, support vector machine, random forest and multilayer perceptron models in cross-validation, and when tested against an external test dataset. By combining both Go-ZT and GAN-ZT, our consensus model improved the SE, SP, PPV, and Kappa, to 71.4%, 95.9%, 71.4% and 0.673, respectively, resulting in an area under the receiver operating characteristic (AUROC) of 0.837. Considering their potential use as prescreening tools, these models could provide in vivo toxicity predictions and insight into the hundreds of thousands of untested chemicals to prioritize compounds for HT testing.
Collapse
Affiliation(s)
- Adrian J. Green
- Department of Biological Sciences, and the Bioinformatics Research Center, NC State University, Raleigh, North Carolina, United States of America
| | - Martin J. Mohlenkamp
- Department of Mathematics, Ohio University, Athens, Ohio, United States of America
| | - Jhuma Das
- Marsico Lung Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Meenal Chaudhari
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, North Carolina, United States of America
| | - Lisa Truong
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| | - Robyn L. Tanguay
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| | - David M. Reif
- Department of Biological Sciences, and the Bioinformatics Research Center, NC State University, Raleigh, North Carolina, United States of America
| |
Collapse
|
72
|
Heng T, Yang D, Wang R, Zhang L, Lu Y, Du G. Progress in Research on Artificial Intelligence Applied to Polymorphism and Cocrystal Prediction. ACS OMEGA 2021; 6:15543-15550. [PMID: 34179597 PMCID: PMC8223226 DOI: 10.1021/acsomega.1c01330] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 05/28/2021] [Indexed: 06/13/2023]
Abstract
Artificial intelligence (AI) is a technology that builds an artificial system with certain intelligence and uses computer software and hardware to simulate intelligent human behavior. When combined with drug research and development, AI can considerably shorten this cycle, improve research efficiency, and minimize costs. The use of machine learning to discover novel materials and predict material properties has become a new research direction. On the basis of the current status of worldwide research on the combination of AI and crystal form and cocrystal, this mini-review analyzes and explores the application of AI in polymorphism prediction, crystal structure analysis, crystal property prediction, cocrystal former (CCF) screening, cocrystal composition prediction, and cocrystal formation prediction. This study provides insights into the future applications of AI in related fields.
Collapse
Affiliation(s)
- Tianyu Heng
- Beijing
City Key Laboratory of Polymorphic Drugs, Center of Pharmaceutical
Polymorphs, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Dezhi Yang
- Beijing
City Key Laboratory of Polymorphic Drugs, Center of Pharmaceutical
Polymorphs, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Ruonan Wang
- Beijing
City Key Laboratory of Polymorphic Drugs, Center of Pharmaceutical
Polymorphs, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Li Zhang
- Beijing
City Key Laboratory of Polymorphic Drugs, Center of Pharmaceutical
Polymorphs, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Yang Lu
- Beijing
City Key Laboratory of Polymorphic Drugs, Center of Pharmaceutical
Polymorphs, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Guanhua Du
- Beijing
City Key Laboratory of Drug Target and Screening Research, National
Center for Pharmaceutical Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union
Medical College, Beijing 100050, P.R. China
| |
Collapse
|
73
|
Deng D, Chen X, Zhang R, Lei Z, Wang X, Zhou F. XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties. J Chem Inf Model 2021; 61:2697-2705. [PMID: 34009965 DOI: 10.1021/acs.jcim.0c01489] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Determining the properties of chemical molecules is essential for screening candidates similar to a specific drug. These candidate molecules are further evaluated for their target binding affinities, side effects, target missing probabilities, etc. Conventional machine learning algorithms demonstrated satisfying prediction accuracies of molecular properties. A molecule cannot be directly loaded into a machine learning model, and a set of engineered features needs to be designed and calculated from a molecule. Such hand-crafted features rely heavily on the experiences of the investigating researchers. The concept of graph neural networks (GNNs) was recently introduced to describe the chemical molecules. The features may be automatically and objectively extracted from the molecules through various types of GNNs, e.g., GCN (graph convolution network), GGNN (gated graph neural network), DMPNN (directed message passing neural network), etc. However, the training of a stable GNN model requires a huge number of training samples and a large amount of computing power, compared with the conventional machine learning strategies. This study proposed the integrated framework XGraphBoost to extract the features using a GNN and build an accurate prediction model of molecular properties using the classifier XGBoost. The proposed framework XGraphBoost fully inherits the merits of the GNN-based automatic molecular feature extraction and XGBoost-based accurate prediction performance. Both classification and regression problems were evaluated using the framework XGraphBoost. The experimental results strongly suggest that XGraphBoost may facilitate the efficient and accurate predictions of various molecular properties. The source code is freely available to academic users at https://github.com/chenxiaowei-vincent/XGraphBoost.git.
Collapse
Affiliation(s)
- Daiguo Deng
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China
| | - Xiaowei Chen
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China
| | - Ruochi Zhang
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China.,College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Zengrong Lei
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P.R. China
| | - Fengfeng Zhou
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| |
Collapse
|
74
|
GPCR_LigandClassify.py; a rigorous machine learning classifier for GPCR targeting compounds. Sci Rep 2021; 11:9510. [PMID: 33947911 PMCID: PMC8097070 DOI: 10.1038/s41598-021-88939-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 04/12/2021] [Indexed: 02/02/2023] Open
Abstract
The current study describes the construction of various ligand-based machine learning models to be used for drug-repurposing against the family of G-Protein Coupled Receptors (GPCRs). In building these models, we collected > 500,000 data points, encompassing experimentally measured molecular association data of > 160,000 unique ligands against > 250 GPCRs. These data points were retrieved from the GPCR-Ligand Association (GLASS) database. We have used diverse molecular featurization methods to describe the input molecules. Multiple supervised ML algorithms were developed, tested and compared for their accuracy, F scores, as well as for their Matthews' correlation coefficient scores (MCC). Our data suggest that combined with molecular fingerprinting, ensemble decision trees and gradient boosted trees ML algorithms are on the accuracy border of the rather sophisticated deep neural nets (DNNs)-based algorithms. On a test dataset, these models displayed an excellent performance, reaching a ~ 90% classification accuracy. Additionally, we showcase a few examples where our models were able to identify interesting connections between known drugs from the Drug-Bank database and members of the GPCR family of receptors. Our findings are in excellent agreement with previously reported experimental observations in the literature. We hope the models presented in this paper synergize with the currently ongoing interest of applying machine learning modeling in the field of drug repurposing and computational drug discovery in general.
Collapse
|
75
|
An AY, Choi KYG, Baghela AS, Hancock REW. An Overview of Biological and Computational Methods for Designing Mechanism-Informed Anti-biofilm Agents. Front Microbiol 2021; 12:640787. [PMID: 33927701 PMCID: PMC8076610 DOI: 10.3389/fmicb.2021.640787] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 03/23/2021] [Indexed: 12/29/2022] Open
Abstract
Bacterial biofilms are complex and highly antibiotic-resistant aggregates of microbes that form on surfaces in the environment and body including medical devices. They are key contributors to the growing antibiotic resistance crisis and account for two-thirds of all infections. Thus, there is a critical need to develop anti-biofilm specific therapeutics. Here we discuss mechanisms of biofilm formation, current anti-biofilm agents, and strategies for developing, discovering, and testing new anti-biofilm agents. Biofilm formation involves many factors and is broadly regulated by the stringent response, quorum sensing, and c-di-GMP signaling, processes that have been targeted by anti-biofilm agents. Developing new anti-biofilm agents requires a comprehensive systems-level understanding of these mechanisms, as well as the discovery of new mechanisms. This can be accomplished through omics approaches such as transcriptomics, metabolomics, and proteomics, which can also be integrated to better understand biofilm biology. Guided by mechanistic understanding, in silico techniques such as virtual screening and machine learning can discover small molecules that can inhibit key biofilm regulators. To increase the likelihood that these candidate agents selected from in silico approaches are efficacious in humans, they must be tested in biologically relevant biofilm models. We discuss the benefits and drawbacks of in vitro and in vivo biofilm models and highlight organoids as a new biofilm model. This review offers a comprehensive guide of current and future biological and computational approaches of anti-biofilm therapeutic discovery for investigators to utilize to combat the antibiotic resistance crisis.
Collapse
Affiliation(s)
| | | | | | - Robert E. W. Hancock
- Centre for Microbial Diseases and Immunity Research, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
76
|
Gallarati S, Fabregat R, Laplaza R, Bhattacharjee S, Wodrich MD, Corminboeuf C. Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts. Chem Sci 2021; 12:6879-6889. [PMID: 34123316 PMCID: PMC8153079 DOI: 10.1039/d1sc00482d] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/01/2021] [Indexed: 12/12/2022] Open
Abstract
Hundreds of catalytic methods are developed each year to meet the demand for high-purity chiral compounds. The computational design of enantioselective organocatalysts remains a significant challenge, as catalysts are typically discovered through experimental screening. Recent advances in combining quantum chemical computations and machine learning (ML) hold great potential to propel the next leap forward in asymmetric catalysis. Within the context of quantum chemical machine learning (QML, or atomistic ML), the ML representations used to encode the three-dimensional structure of molecules and evaluate their similarity cannot easily capture the subtle energy differences that govern enantioselectivity. Here, we present a general strategy for improving molecular representations within an atomistic machine learning model to predict the DFT-computed enantiomeric excess of asymmetric propargylation organocatalysts solely from the structure of catalytic cycle intermediates. Mean absolute errors as low as 0.25 kcal mol-1 were achieved in predictions of the activation energy with respect to DFT computations. By virtue of its design, this strategy is generalisable to other ML models, to experimental data and to any catalytic asymmetric reaction, enabling the rapid screening of structurally diverse organocatalysts from available structural information.
Collapse
Affiliation(s)
- Simone Gallarati
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Raimon Fabregat
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Rubén Laplaza
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Sinjini Bhattacharjee
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- Indian Institute of Science Education and Research Dr Homi Bhabha Rd, Ward No. 8, NCL Colony, Pashan Pune Maharashtra 411008 India
| | - Matthew D Wodrich
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| |
Collapse
|
77
|
Aghdam SA, Brown AMV. Deep learning approaches for natural product discovery from plant endophytic microbiomes. ENVIRONMENTAL MICROBIOME 2021; 16:6. [PMID: 33758794 PMCID: PMC7972023 DOI: 10.1186/s40793-021-00375-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 02/21/2021] [Indexed: 05/10/2023]
Abstract
Plant microbiomes are not only diverse, but also appear to host a vast pool of secondary metabolites holding great promise for bioactive natural products and drug discovery. Yet, most microbes within plants appear to be uncultivable, and for those that can be cultivated, their metabolic potential lies largely hidden through regulatory silencing of biosynthetic genes. The recent explosion of powerful interdisciplinary approaches, including multi-omics methods to address multi-trophic interactions and artificial intelligence-based computational approaches to infer distribution of function, together present a paradigm shift in high-throughput approaches to natural product discovery from plant-associated microbes. Arguably, the key to characterizing and harnessing this biochemical capacity depends on a novel, systematic approach to characterize the triggers that turn on secondary metabolite biosynthesis through molecular or genetic signals from the host plant, members of the rich 'in planta' community, or from the environment. This review explores breakthrough approaches for natural product discovery from plant microbiomes, emphasizing the promise of deep learning as a tool for endophyte bioprospecting, endophyte biochemical novelty prediction, and endophyte regulatory control. It concludes with a proposed pipeline to harness global databases (genomic, metabolomic, regulomic, and chemical) to uncover and unsilence desirable natural products. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1186/s40793-021-00375-0.
Collapse
Affiliation(s)
- Shiva Abdollahi Aghdam
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX 79409 USA
| | - Amanda May Vivian Brown
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX 79409 USA
| |
Collapse
|
78
|
Sifain AE, Rice BM, Yalkowsky SH, Barnes BC. Machine learning transition temperatures from 2D structure. J Mol Graph Model 2021; 105:107848. [PMID: 33667863 DOI: 10.1016/j.jmgm.2021.107848] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/11/2021] [Accepted: 01/19/2021] [Indexed: 10/22/2022]
Abstract
A priori knowledge of physicochemical properties such as melting and boiling could expedite materials discovery. However, theoretical modeling from first principles poses a challenge for efficient virtual screening of potential candidates. As an alternative, the tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. Herein, we extend a molecular representation, or set of descriptors, first developed for quantitative structure-property relationship modeling by Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). This molecular representation has group-constitutive and geometrical descriptors that map to enthalpy and entropy; two thermodynamic quantities that drive thermal phase transitions. We extend the UPPER representation to include additional information about sp2-bonded fragments. Additionally, instead of using the UPPER descriptors in a series of thermodynamically-inspired calculations, as per Yalkowsky, we use the descriptors to construct a vector representation for use with machine learning techniques. The concise and easy-to-compute representation, combined with a gradient-boosting decision tree model, provides an appealing framework for predicting experimental transition temperatures in a diverse chemical space. An application to energetic materials shows that the method is predictive, despite a relatively modest energetics reference dataset. We also report competitive results on diverse public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergström) comprised of over 47k structures. Open source software is available at https://github.com/USArmyResearchLab/ARL-UPPER.
Collapse
Affiliation(s)
- Andrew E Sifain
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA
| | - Betsy M Rice
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA
| | - Samuel H Yalkowsky
- Department of Pharmaceutics, College of Pharmacy, University of Arizona, Tucson, AZ, 85721, USA
| | - Brian C Barnes
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA.
| |
Collapse
|
79
|
Karuth A, Alesadi A, Xia W, Rasulev B. Predicting glass transition of amorphous polymers by application of cheminformatics and molecular dynamics simulations. POLYMER 2021. [DOI: 10.1016/j.polymer.2021.123495] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
80
|
Ding J, Xu N, Nguyen MT, Qiao Q, Shi Y, He Y, Shao Q. Machine learning for molecular thermodynamics. Chin J Chem Eng 2021. [DOI: 10.1016/j.cjche.2020.10.044] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
81
|
McComb M, Bies R, Ramanathan M. Machine learning in pharmacometrics: Opportunities and challenges. Br J Clin Pharmacol 2021; 88:1482-1499. [PMID: 33634893 DOI: 10.1111/bcp.14801] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 02/08/2021] [Accepted: 02/12/2021] [Indexed: 12/13/2022] Open
Abstract
The explosive growth in medical devices, imaging and diagnostics, computing, and communication and information technologies in drug development and healthcare has created an ever-expanding data landscape that the pharmacometrics (PMX) research community must now traverse. The tools of machine learning (ML) have emerged as a powerful computational approach in other data-rich disciplines but its effective utilization in the pharmaceutical sciences and PMX modelling is in its infancy. ML-based methods can complement PMX modelling by enabling the information in diverse sources of big data, e.g. population-based public databases and disease-specific clinical registries, to be harnessed because they are capable of efficiently identifying salient variables associated with outcomes and delineating their interdependencies. ML algorithms are computationally efficient, have strong predictive capabilities and can enable learning in the big data setting. ML algorithms can be viewed as providing a computational bridge from big data to complement PMX modelling. This review provides an overview of the strengths and weaknesses of ML approaches vis-à-vis population methods, assesses current research into ML applications in the pharmaceutical sciences and provides perspective for potential opportunities and strategies for the successful integration and utilization of ML in PMX.
Collapse
Affiliation(s)
- Mason McComb
- Department of Pharmaceutical Sciences, University at Buffalo, University at Buffalo, State University of New York, Buffalo, NY, USA
| | - Robert Bies
- Department of Pharmaceutical Sciences, University at Buffalo, University at Buffalo, State University of New York, Buffalo, NY, USA.,Institute for Computational Data Science, University at Buffalo, NY, USA
| | - Murali Ramanathan
- Department of Pharmaceutical Sciences, University at Buffalo, University at Buffalo, State University of New York, Buffalo, NY, USA.,Department of Neurology, University at Buffalo, State University of New York, Buffalo, NY, USA
| |
Collapse
|
82
|
Jiang D, Wu Z, Hsieh CY, Chen G, Liao B, Wang Z, Shen C, Cao D, Wu J, Hou T. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminform 2021; 13:12. [PMID: 33597034 PMCID: PMC7888189 DOI: 10.1186/s13321-020-00479-8] [Citation(s) in RCA: 162] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/26/2020] [Indexed: 12/31/2022] Open
Abstract
Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.![]()
Collapse
Affiliation(s)
- Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.,State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058, Zhejiang, China.,College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory Tencent, Shenzhen, 518057, Guangdong, China
| | - Guangyong Chen
- Shenzhen Institutes of Advanced Technology, Shenzhen, 518055, Guangdong, China
| | - Ben Liao
- Tencent Quantum Laboratory Tencent, Shenzhen, 518057, Guangdong, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004, Hunan, China.
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China. .,State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
83
|
Minias A, Żukowska L, Lechowicz E, Gąsior F, Knast A, Podlewska S, Zygała D, Dziadek J. Early Drug Development and Evaluation of Putative Antitubercular Compounds in the -Omics Era. Front Microbiol 2021; 11:618168. [PMID: 33603720 PMCID: PMC7884339 DOI: 10.3389/fmicb.2020.618168] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 12/30/2020] [Indexed: 12/14/2022] Open
Abstract
Tuberculosis (TB) is an infectious disease caused by the bacterium Mycobacterium tuberculosis. According to the WHO, the disease is one of the top 10 causes of death of people worldwide. Mycobacterium tuberculosis is an intracellular pathogen with an unusually thick, waxy cell wall and a complex life cycle. These factors, combined with M. tuberculosis ability to enter prolonged periods of latency, make the bacterium very difficult to eradicate. The standard treatment of TB requires 6-20months, depending on the drug susceptibility of the infecting strain. The need to take cocktails of antibiotics to treat tuberculosis effectively and the emergence of drug-resistant strains prompts the need to search for new antitubercular compounds. This review provides a perspective on how modern -omic technologies facilitate the drug discovery process for tuberculosis treatment. We discuss how methods of DNA and RNA sequencing, proteomics, and genetic manipulation of organisms increase our understanding of mechanisms of action of antibiotics and allow the evaluation of drugs. We explore the utility of mathematical modeling and modern computational analysis for the drug discovery process. Finally, we summarize how -omic technologies contribute to our understanding of the emergence of drug resistance.
Collapse
Affiliation(s)
- Alina Minias
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
| | - Lidia Żukowska
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
- BioMedChem Doctoral School of the University of Lodz and the Institutes of the Polish Academy of Sciences in Lodz, Lodz, Poland
| | - Ewelina Lechowicz
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
- Institute of Microbiology, Biotechnology and Immunology, Faculty of Biology and Environmental Protection, University of Lodz, Lodz, Poland
| | - Filip Gąsior
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
- BioMedChem Doctoral School of the University of Lodz and the Institutes of the Polish Academy of Sciences in Lodz, Lodz, Poland
| | - Agnieszka Knast
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
- Institute of Molecular and Industrial Biotechnology, Faculty of Biotechnology and Food Sciences, Lodz University of Technology, Lodz, Poland
| | - Sabina Podlewska
- Department of Technology and Biotechnology of Drugs, Jagiellonian University Medical College, Krakow, Poland
- Maj Institute of Pharmacology, Polish Academy of Sciences, Krakow, Poland
| | - Daria Zygała
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
- Institute of Microbiology, Biotechnology and Immunology, Faculty of Biology and Environmental Protection, University of Lodz, Lodz, Poland
| | - Jarosław Dziadek
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
| |
Collapse
|
84
|
Espinoza GZ, Angelo RM, Oliveira PR, Honorio KM. Evaluating Deep Learning models for predicting ALK-5 inhibition. PLoS One 2021; 16:e0246126. [PMID: 33508008 PMCID: PMC7842961 DOI: 10.1371/journal.pone.0246126] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 01/14/2021] [Indexed: 11/18/2022] Open
Abstract
Computational methods have been widely used in drug design. The recent developments in machine learning techniques and the ever-growing chemical and biological databases are fertile ground for discoveries in this area. In this study, we evaluated the performance of Deep Learning models in comparison to Random Forest, and Support Vector Regression for predicting the biological activity (pIC50) of ALK-5 inhibitors as candidates to treat cancer. The generalization power of the models was assessed by internal and external validation procedures. A deep neural network model obtained the best performance in this comparative study, achieving a coefficient of determination of 0.658 on the external validation set with mean square error and mean absolute error of 0.373 and 0.450, respectively. Additionally, the relevance of the chemical descriptors for the prediction of biological activity was estimated using Permutation Importance. We can conclude that the forecast model obtained by the deep neural network is suitable for the problem and can be employed to predict the biological activity of new ALK-5 inhibitors.
Collapse
Affiliation(s)
- Gabriel Z. Espinoza
- School of Arts, Sciences and Humanities, University of Sao Paulo, Sao Paulo, Sao Paulo, Brazil
| | - Rafaela M. Angelo
- School of Arts, Sciences and Humanities, University of Sao Paulo, Sao Paulo, Sao Paulo, Brazil
| | - Patricia R. Oliveira
- School of Arts, Sciences and Humanities, University of Sao Paulo, Sao Paulo, Sao Paulo, Brazil
- * E-mail: (PRO); (KMH)
| | - Kathia M. Honorio
- School of Arts, Sciences and Humanities, University of Sao Paulo, Sao Paulo, Sao Paulo, Brazil
- Federal University of ABC, Santo Andre, Sao Paulo, Brazil
- * E-mail: (PRO); (KMH)
| |
Collapse
|
85
|
Predictive Models for the Binary Diffusion Coefficient at Infinite Dilution in Polar and Nonpolar Fluids. MATERIALS 2021; 14:ma14030542. [PMID: 33498723 PMCID: PMC7866074 DOI: 10.3390/ma14030542] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 01/07/2021] [Accepted: 01/19/2021] [Indexed: 12/03/2022]
Abstract
Experimental diffusivities are scarcely available, though their knowledge is essential to model rate-controlled processes. In this work various machine learning models to estimate diffusivities in polar and nonpolar solvents (except water and supercritical CO2) were developed. Such models were trained on a database of 90 polar systems (1431 points) and 154 nonpolar systems (1129 points) with data on 20 properties. Five machine learning algorithms were evaluated: multilinear regression, k-nearest neighbors, decision tree, and two ensemble methods (random forest and gradient boosted). For both polar and nonpolar data, the best results were found using the gradient boosted algorithm. The model for polar systems contains 6 variables/parameters (temperature, solvent viscosity, solute molar mass, solute critical pressure, solvent molar mass, and solvent Lennard-Jones energy constant) and showed an average deviation (AARD) of 5.07%. The nonpolar model requires five variables/parameters (the same of polar systems except the Lennard-Jones constant) and presents AARD = 5.86%. These results were compared with four classic models, including the 2-parameter correlation of Magalhães et al. (AARD = 5.19/6.19% for polar/nonpolar) and the predictive Wilke-Chang equation (AARD = 40.92/29.19%). Nonetheless Magalhães et al. requires two parameters per system that must be previously fitted to data. The developed models are coded and provided as command line program.
Collapse
|
86
|
Shamsara J. Evaluation of the performance of various machine learning methods on the discrimination of the active compounds. Chem Biol Drug Des 2021; 97:930-943. [DOI: 10.1111/cbdd.13819] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 12/10/2020] [Accepted: 12/21/2020] [Indexed: 12/12/2022]
Affiliation(s)
- Jamal Shamsara
- Pharmaceutical Research Center Pharmaceutical Technology Institute Mashhad University of Medical Sciences Mashhad Iran
| |
Collapse
|
87
|
Chemoinformatics and QSAR. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
88
|
Extended Regression Modeling of the Toxicity of Phenol Derivatives to <i>Tetrahymena pyriformis</i> Using the Electronic-Structure Informatics Descriptor. JOURNAL OF COMPUTER AIDED CHEMISTRY 2021. [DOI: 10.2751/jcac.22.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
89
|
Yuan J, Liu X, Wang S, Chang C, Zeng Q, Song Z, Jin Y, Zeng Q, Sun G, Ruan S, Greenwell C, Abramov YA. Virtual coformer screening by a combined machine learning and physics-based approach. CrystEngComm 2021. [DOI: 10.1039/d1ce00587a] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Cocrystals as a solid form technology for improving physicochemical properties have gained increasing popularity in the pharmaceutical, nutraceutical, and agrochemical industries.
Collapse
Affiliation(s)
- Jiuchuang Yuan
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Xuetao Liu
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogeomics, Peking University Shenzhen Graduate School, Shenzhen, 518055 China
| | - Simin Wang
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Chao Chang
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Qiao Zeng
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Zhengtian Song
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Yingdi Jin
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Qun Zeng
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Guangxu Sun
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Shigang Ruan
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | | | - Yuriy A. Abramov
- XtalPi Inc, Cambridge, Massachusetts 02142, USA
- Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
90
|
Wang MWH, Goodman JM, Allen TEH. Machine Learning in Predictive Toxicology: Recent Applications and Future Directions for Classification Models. Chem Res Toxicol 2020; 34:217-239. [PMID: 33356168 DOI: 10.1021/acs.chemrestox.0c00316] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In recent times, machine learning has become increasingly prominent in predictive toxicology as it has shifted from in vivo studies toward in silico studies. Currently, in vitro methods together with other computational methods such as quantitative structure-activity relationship modeling and absorption, distribution, metabolism, and excretion calculations are being used. An overview of machine learning and its applications in predictive toxicology is presented here, including support vector machines (SVMs), random forest (RF) and decision trees (DTs), neural networks, regression models, naïve Bayes, k-nearest neighbors, and ensemble learning. The recent successes of these machine learning methods in predictive toxicology are summarized, and a comparison of some models used in predictive toxicology is presented. In predictive toxicology, SVMs, RF, and DTs are the dominant machine learning methods due to the characteristics of the data available. Lastly, this review describes the current challenges facing the use of machine learning in predictive toxicology and offers insights into the possible areas of improvement in the field.
Collapse
Affiliation(s)
- Marcus W H Wang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Jonathan M Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Timothy E H Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.,MRC Toxicology Unit, University of Cambridge, Hodgkin Building, Lancaster Road, Leicester LE1 7HB, United Kingdom
| |
Collapse
|
91
|
Lane TR, Foil DH, Minerali E, Urbina F, Zorn KM, Ekins S. Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery. Mol Pharm 2020; 18:403-415. [PMID: 33325717 DOI: 10.1021/acs.molpharmaceut.0c01013] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies, we and others have applied multiple machine learning algorithms and modeling metrics and, in some cases, compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and in comparison of our proprietary software Assay Central with random forest, k-nearest neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (three layers). Model performance was assessed using an array of fivefold cross-validation metrics including area-under-the-curve, F1 score, Cohen's kappa, and Matthews correlation coefficient. Based on ranked normalized scores for the metrics or datasets, all methods appeared comparable, while the distance from the top indicated that Assay Central and support vector classification were comparable. Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case. If anything, Assay Central may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay Central performance, although support vector classification seems to be a strong competitor. We also applied Assay Central to perform prospective predictions for the toxicity targets PXR and hERG to further validate these models. This work appears to be the largest scale comparison of these machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors, and machine learning algorithms and further refine the methods for evaluating and comparing such models.
Collapse
Affiliation(s)
- Thomas R Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Daniel H Foil
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Eni Minerali
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Fabio Urbina
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7545, United States
| | - Kimberley M Zorn
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| |
Collapse
|
92
|
Sun W, Braatz RD. Opportunities in tensorial data analytics for chemical and biological manufacturing processes. Comput Chem Eng 2020. [DOI: 10.1016/j.compchemeng.2020.107099] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
93
|
de Albuquerque S, Cianni L, de Vita D, Duque C, Gomes ASM, Gomes P, Laughton C, Leitão A, Montanari CA, Montanari R, Ribeiro JFR, da Silva JS, Teixeira C. Molecular design aided by random forests and synthesis of potent trypanocidal agents as cruzain inhibitors for Chagas disease treatment. Chem Biol Drug Des 2020; 96:948-960. [PMID: 33058457 DOI: 10.1111/cbdd.13663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 12/13/2019] [Accepted: 12/23/2019] [Indexed: 11/30/2022]
Abstract
Cruzain is an established target for the identification of novel trypanocidal agents, but how good are in vitro/in vivo correlations? This work describes the development of a random forests model for the prediction of the bioavailability of cruzain inhibitors that are Trypanosoma cruzi killers. Some common properties that characterize drug-likeness are poorly represented in many established cruzain inhibitors. This correlates with the evidence that many high-affinity cruzain inhibitors are not trypanocidal agents against T. cruzi. On the other hand, T. cruzi killers that present typical drug-like characteristics are likely to show better trypanocidal action than those without such features. The random forests model was not outperformed by other machine learning methods (such as artificial neural networks and support vector machines), and it was validated with the synthesis of two new trypanocidal agents. Specifically, we report a new lead compound, Neq0565, which was tested on T. cruzi Tulahuen (β-galactosidase) with a pEC50 of 4.9. It is inactive in the host cell line showing a selectivity index (SI = EC50 cyto /EC50 T. cruzi ) higher than 50.
Collapse
Affiliation(s)
- Sérgio de Albuquerque
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Lorenzo Cianni
- Grupo de Química Medicinal, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos/SP, Brazil
| | - Daniela de Vita
- Grupo de Química Medicinal, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos/SP, Brazil
| | - Carla Duque
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Ana S M Gomes
- LAQV-REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Paula Gomes
- LAQV-REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Charles Laughton
- School of Pharmacy and Centre for Biomolecular Sciences, University of Nottingham, Nottingham, UK
| | - Andrei Leitão
- Grupo de Química Medicinal, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos/SP, Brazil
| | - Carlos A Montanari
- Grupo de Química Medicinal, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos/SP, Brazil
| | - Raphael Montanari
- Centro de Robótica de São Carlos, EESC-ICMC, Universidade de São Paulo, São Paulo, Brazil
| | - Jean F R Ribeiro
- Grupo de Química Medicinal, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos/SP, Brazil
| | - João Santana da Silva
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Cátia Teixeira
- LAQV-REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| |
Collapse
|
94
|
Yang S, Ye Q, Ding J, Yin, Lu A, Chen X, Hou T, Cao D. Current advances in ligand‐based target prediction. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Su‐Qing Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha Hunan China
| | - Qing Ye
- College of Pharmaceutical Sciences Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University Hangzhou, Zhejiang China
| | - Jun‐Jie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing China
| | - Yin
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital Central South University Changsha Hunan China
| | - Ai‐Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital Central South University Changsha Hunan China
| | - Ting‐Jun Hou
- College of Pharmaceutical Sciences Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University Hangzhou, Zhejiang China
| | - Dong‐Sheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha Hunan China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong China
| |
Collapse
|
95
|
Fine J, Kuan-Yu Liu J, Beck A, Alzarieni KZ, Ma X, Boulos VM, Kenttämaa HI, Chopra G. Graph-based machine learning interprets and predicts diagnostic isomer-selective ion-molecule reactions in tandem mass spectrometry. Chem Sci 2020; 11:11849-11858. [PMID: 34094414 PMCID: PMC8162943 DOI: 10.1039/d0sc02530e] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Diagnostic ion-molecule reactions employed in tandem mass spectrometry experiments can frequently be used to differentiate between isomeric compounds unlike the popular collision-activated dissociation methodology. Selected neutral reagents, such as 2-methoxypropene (MOP), are introduced into an ion trap mass spectrometer where they react with protonated analytes to yield product ions that are diagnostic for the functional groups present in the analytes. However, the understanding and interpretation of the mass spectra obtained can be challenging and time-consuming. Here, we introduce the first bootstrapped decision tree model trained on 36 known ion-molecule reactions with MOP. It uses the graph-based connectivity of analytes' functional groups as input to predict whether the protonated analyte will undergo a diagnostic reaction with MOP. A Cohen kappa statistic of 0.70 was achieved with a blind test set, suggesting substantial inter-model reliability on limited training data. Prospective diagnostic product predictions were experimentally tested for 13 previously unpublished analytes. We introduce chemical reactivity flowcharts to facilitate chemical interpretation of the decisions made by the machine learning method that will be useful to understand and interpret the mass spectra for chemical reactivity.
Collapse
Affiliation(s)
- Jonathan Fine
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN USA
| | - Judy Kuan-Yu Liu
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN USA
| | - Armen Beck
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN USA
| | - Kawthar Z Alzarieni
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN USA
| | - Xin Ma
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN USA
| | - Victoria M Boulos
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN USA
| | - Hilkka I Kenttämaa
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN USA
| | - Gaurav Chopra
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN USA .,Purdue Institute for Drug Discovery, Integrative Data Science Institute, Purdue Center for Cancer Research, Purdue Institute for Inflammation, Immunology and Infectious Disease, Purdue Institute for Integrative Neuroscience West Lafayette IN USA
| |
Collapse
|
96
|
Capecchi A, Reymond JL. Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning. Biomolecules 2020; 10:E1385. [PMID: 32998475 PMCID: PMC7600738 DOI: 10.3390/biom10101385] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 09/22/2020] [Accepted: 09/25/2020] [Indexed: 12/20/2022] Open
Abstract
Microbial natural products (NPs) are an important source of drugs, however, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP). The resulting interactive map organizes molecules by physico-chemical properties and compound families such as peptides and glycosides. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin.
Collapse
Affiliation(s)
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland;
| |
Collapse
|
97
|
Lentelink NJ, Palkovits S. Transfer Learning as Tool to Enhance Predictions of Molecular Properties Based on 2D Projections. ADVANCED THEORY AND SIMULATIONS 2020. [DOI: 10.1002/adts.202000148] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Niklas Julian Lentelink
- Institute of Technical and Macromolecular Chemistry RWTH Aachen University Worringer Weg 2 Aachen 52074 Germany
| | - Stefan Palkovits
- Institute of Technical and Macromolecular Chemistry RWTH Aachen University Worringer Weg 2 Aachen 52074 Germany
| |
Collapse
|
98
|
Shamsara J. A Random Forest Model to Predict the Activity of a Large Set of Soluble Epoxide Hydrolase Inhibitors Solely Based on a Set of Simple Fragmental Descriptors. Comb Chem High Throughput Screen 2020; 22:555-569. [PMID: 31622216 DOI: 10.2174/1386207322666191016110232] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 08/02/2019] [Accepted: 09/19/2019] [Indexed: 01/10/2023]
Abstract
BACKGROUND The Soluble Epoxide Hydrolase (sEH) is a ubiquitously expressed enzyme in various tissues. The inhibition of the sEH has shown promising results to treat hypertension, alleviate pain and inflammation. OBJECTIVE In this study, the power of machine learning has been employed to develop a predictive QSAR model for a large set of sEH inhibitors. METHODS In this study, the random forest method was employed to make a valid model for the prediction of sEH inhibition. Besides, two new methods (Treeinterpreter python package and LIME, Local Interpretable Model-agnostic Explanations) have been exploited to explain and interpret the model. RESULTS The performance metrics of the model were as follows: R2=0.831, Q2=0.565, RMSE=0.552 and R2 pred=0.595. The model also demonstrated good predictability on the two extra external test sets at least in terms of ranking. The Spearman's rank correlation coefficients for external test set 1 and 2 were 0.872 and 0.673, respectively. The external test set 2 was a diverse one compared to the training set. Therefore, the model could be used for virtual screening to enrich potential sEH inhibitors among a diverse compound library. CONCLUSION As the model was solely developed based on a set of simple fragmental descriptors, the model was explained by two local interpretation algorithms, and this could guide medicinal chemists to design new sEH inhibitors. Moreover, the most important general descriptors (fragments) suggested by the model were consistent with the available crystallographic data. The model is available as an executable binary at http://www.pharm-sbg.com and https://github.com/shamsaraj.
Collapse
Affiliation(s)
- Jamal Shamsara
- Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
99
|
Zadorozhnii PV, Kiselev VV, Kharchenko AV. In silico toxicity evaluation of Salubrinal and its analogues. Eur J Pharm Sci 2020; 155:105538. [PMID: 32889087 DOI: 10.1016/j.ejps.2020.105538] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 08/14/2020] [Accepted: 08/30/2020] [Indexed: 02/06/2023]
Abstract
This paper reports on a comprehensive in silico toxicity assessment of Salubrinal and its analogues containing a cinnamic acid residue or quinoline ring using the online servers admetSAR, ADMETlab, ProTox, ADVERPred, Pred-hERG and Vienna LiverTox. Apart from rare exceptions, in all 55 studied structures, mild or practical absence of acute toxicity was predicted for rats (III or IV toxicity class). Cardiotoxic, hepatotoxic and immunotoxic effects were predicted for Salubrinal and its analogues. We constructed models of the main predicted anti-targets hERG, BSEP, MRP3, MRP4 and AhR using the principle of homologous modeling. Molecular docking studies were carried out with the obtained models. We carried out molecular docking for all targets using AutoDock Vina, implemented in the PyRx 0.8 software package. According to the results of molecular docking, the compounds analyzed are potential moderate or weak hERG blockers. Induction of cholestasis and, as a consequence, liver damage by these drugs, directly related to inhibition of BSEP, MRP3 and MRP4, most likely will not be observed. Interaction with AhR for the studied compounds is impossible for steric reasons and, as a consequence, toxic effects on the immune and other organ systems associated with the activation of the AhR signaling pathway are excluded.
Collapse
Affiliation(s)
- Pavlo V Zadorozhnii
- Department of pharmacy and technology of organic substances, Ukrainian State University of Chemical Technology, Gagarin Ave., 8, Dnipro 49005, Ukraine.
| | - Vadym V Kiselev
- Department of pharmacy and technology of organic substances, Ukrainian State University of Chemical Technology, Gagarin Ave., 8, Dnipro 49005, Ukraine
| | - Aleksandr V Kharchenko
- Department of pharmacy and technology of organic substances, Ukrainian State University of Chemical Technology, Gagarin Ave., 8, Dnipro 49005, Ukraine
| |
Collapse
|
100
|
Achary PGR. Applications of Quantitative Structure-Activity Relationships (QSAR) based Virtual Screening in Drug Design: A Review. Mini Rev Med Chem 2020; 20:1375-1388. [DOI: 10.2174/1389557520666200429102334] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 11/07/2019] [Accepted: 11/08/2019] [Indexed: 12/18/2022]
Abstract
The scientists, and the researchers around the globe generate tremendous amount of information
everyday; for instance, so far more than 74 million molecules are registered in Chemical
Abstract Services. According to a recent study, at present we have around 1060 molecules, which are
classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical
space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good
number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today.
The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’
will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules
is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important
computational tool in the drug discovery process; however, experimental verification of the
drugs also equally important for the drug development process. The quantitative structure-activity relationship
(QSAR) analysis is one of the machine learning technique, which is extensively used in VS
techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate.
The QSAR model building involves (i) chemo-genomics data collection from a database or literature
(ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship
(model) between biological activity and the selected descriptors (iv) application of QSAR model to
predict the biological property for the molecules. All the hits obtained by the VS technique needs to be
experimentally verified. The present mini-review highlights: the web-based machine learning tools, the
role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery
and advantages and challenges of QSAR.
Collapse
Affiliation(s)
- Patnala Ganga Raju Achary
- Department of Chemistry, Faculty of Engineering & Technology (ITER), Siksha ‘O’ Anusandhan, Deemed to be University, Khandagiri Square, Bhubaneswar- 751030, India
| |
Collapse
|