1
|
Tiihonen A, Cox-Vazquez SJ, Liang Q, Ragab M, Ren Z, Hartono NTP, Liu Z, Sun S, Zhou C, Incandela NC, Limwongyut J, Moreland AS, Jayavelu S, Bazan GC, Buonassisi T. Predicting Antimicrobial Activity of Conjugated Oligoelectrolyte Molecules via Machine Learning. J Am Chem Soc 2021; 143:18917-18931. [PMID: 34739239 DOI: 10.1021/jacs.1c05055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
New antibiotics are needed to battle growing antibiotic resistance, but the development process from hit, to lead, and ultimately to a useful drug takes decades. Although progress in molecular property prediction using machine-learning methods has opened up new pathways for aiding the antibiotics development process, many existing solutions rely on large data sets and finding structural similarities to existing antibiotics. Challenges remain in modeling unconventional antibiotic classes that are drawing increasing research attention. In response, we developed an antimicrobial activity prediction model for conjugated oligoelectrolyte molecules, a new class of antibiotics that lacks extensive prior structure-activity relationship studies. Our approach enables us to predict the minimum inhibitory concentration for E. coli K12, with 21 molecular descriptors selected by recursive elimination from a set of 5305 descriptors. This predictive model achieves an R2 of 0.65 with no prior knowledge of the underlying mechanism. We find the molecular representation optimum for the domain is the key to good predictions of antimicrobial activity. In the case of conjugated oligoelectrolytes, a representation reflecting the three-dimensional shape of the molecules is most critical. Although it is demonstrated with a specific example of conjugated oligoelectrolytes, our proposed approach for creating the predictive model can be readily adapted to other novel antibiotic candidate domains.
Collapse
Affiliation(s)
- Armi Tiihonen
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Sarah J Cox-Vazquez
- Departments of Chemistry and Chemical & Biomolecular Engineering, National University of Singapore, Singapore 119077, Singapore
| | - Qiaohao Liang
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Mohamed Ragab
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Zekun Ren
- Singapore MIT Alliance for Research and Technology, #05-09, Innovation Wing, 1 Create Way, Singapore 138602, Singapore
| | | | - Zhe Liu
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Shijing Sun
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Cheng Zhou
- Departments of Chemistry and Chemical & Biomolecular Engineering, National University of Singapore, Singapore 119077, Singapore
| | - Nathan C Incandela
- Center for Polymers and Organic Solids, Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, California 93106, United States
| | - Jakkarin Limwongyut
- Center for Polymers and Organic Solids, Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, California 93106, United States
| | - Alex S Moreland
- Center for Polymers and Organic Solids, Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, California 93106, United States
| | - Senthilnath Jayavelu
- Institute for Infocomm Research, Artificial Intelligence, Analytics and Informatics, Agency for Science, Technology and Research, Singapore 138632, Singapore
| | - Guillermo C Bazan
- Departments of Chemistry and Chemical & Biomolecular Engineering, National University of Singapore, Singapore 119077, Singapore.,Center for Polymers and Organic Solids, Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, California 93106, United States
| | - Tonio Buonassisi
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Singapore MIT Alliance for Research and Technology, #05-09, Innovation Wing, 1 Create Way, Singapore 138602, Singapore
| |
Collapse
|
2
|
Sabando MV, Ponzoni I, Milios EE, Soto AJ. Using molecular embeddings in QSAR modeling: does it make a difference? Brief Bioinform 2021; 23:6366344. [PMID: 34498670 DOI: 10.1093/bib/bbab365] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/29/2021] [Accepted: 08/18/2021] [Indexed: 11/13/2022] Open
Abstract
With the consolidation of deep learning in drug discovery, several novel algorithms for learning molecular representations have been proposed. Despite the interest of the community in developing new methods for learning molecular embeddings and their theoretical benefits, comparing molecular embeddings with each other and with traditional representations is not straightforward, which in turn hinders the process of choosing a suitable representation for Quantitative Structure-Activity Relationship (QSAR) modeling. A reason behind this issue is the difficulty of conducting a fair and thorough comparison of the different existing embedding approaches, which requires numerous experiments on various datasets and training scenarios. To close this gap, we reviewed the literature on methods for molecular embeddings and reproduced three unsupervised and two supervised molecular embedding techniques recently proposed in the literature. We compared these five methods concerning their performance in QSAR scenarios using different classification and regression datasets. We also compared these representations to traditional molecular representations, namely molecular descriptors and fingerprints. As opposed to the expected outcome, our experimental setup consisting of over $25 000$ trained models and statistical tests revealed that the predictive performance using molecular embeddings did not significantly surpass that of traditional representations. Although supervised embeddings yielded competitive results compared with those using traditional molecular representations, unsupervised embeddings tended to perform worse than traditional representations. Our results highlight the need for conducting a careful comparison and analysis of the different embedding techniques prior to using them in drug design tasks and motivate a discussion about the potential of molecular embeddings in computer-aided drug design.
Collapse
Affiliation(s)
| | - Ignacio Ponzoni
- Institute for Computer Science and Engineering, UNS-CONICET, Bahía Blanca, Argentina.,Department of Computer Science and Engineering, Universidad Nacional del Sur, Bahía Blanca, Argentina
| | | | - Axel J Soto
- Institute for Computer Science and Engineering, UNS-CONICET, Bahía Blanca, Argentina.,Department of Computer Science and Engineering, Universidad Nacional del Sur, Bahía Blanca, Argentina
| |
Collapse
|
3
|
Chen Z, Li D, Wan H, Liu M, Liu J. Unsupervised machine learning methods for polymer nanocomposites data via molecular dynamics simulation. MOLECULAR SIMULATION 2020. [DOI: 10.1080/08927022.2020.1851028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Zhudan Chen
- Institute of Automation, Beijing University of Chemical Technology, Beijing, People’s Republic of China
| | - Dazi Li
- Institute of Automation, Beijing University of Chemical Technology, Beijing, People’s Republic of China
| | - Haixiao Wan
- Key Laboratory of Beijing City on Preparation and Processing of Novel Polymer Materials, Beijing University of Chemical Technology, Beijing, People’s Republic of China
| | - Minghui Liu
- Key Laboratory of Beijing City on Preparation and Processing of Novel Polymer Materials, Beijing University of Chemical Technology, Beijing, People’s Republic of China
| | - Jun Liu
- Key Laboratory of Beijing City on Preparation and Processing of Novel Polymer Materials, Beijing University of Chemical Technology, Beijing, People’s Republic of China
| |
Collapse
|
4
|
Laghuvarapu S, Pathak Y, Priyakumar UD. BAND NN: A Deep Learning Framework for Energy Prediction and Geometry Optimization of Organic Small Molecules. J Comput Chem 2019; 41:790-799. [DOI: 10.1002/jcc.26128] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 11/13/2019] [Accepted: 11/21/2019] [Indexed: 12/26/2022]
Affiliation(s)
- Siddhartha Laghuvarapu
- Center for Computational Natural Sciences and BioinformaticsInternational Institute of Information Technology Hyderabad 500 032 India
| | - Yashaswi Pathak
- Center for Computational Natural Sciences and BioinformaticsInternational Institute of Information Technology Hyderabad 500 032 India
| | - U. Deva Priyakumar
- Center for Computational Natural Sciences and BioinformaticsInternational Institute of Information Technology Hyderabad 500 032 India
| |
Collapse
|
5
|
Barnard AS, Motevalli B, Parker AJ, Fischer JM, Feigl CA, Opletal G. Nanoinformatics, and the big challenges for the science of small things. NANOSCALE 2019; 11:19190-19201. [PMID: 31397835 DOI: 10.1039/c9nr05912a] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The combination of computational chemistry and computational materials science with machine learning and artificial intelligence provides a powerful way of relating structural features of nanomaterials with functional properties. However, combining these fundamentally different scientific approaches is not as straightforward as it seems. Machine learning methods were developed for large data sets with small numbers of consistent features. Typically nanomaterials data sets are small, with high dimensionality and high variance in the feature space, and suffer from numerous destructive biases. None of the established data science or machine learning methods in widespread use today were devised with (nano)materials data sets in mind, but there are ways to overcome these challenges and use them reliably. In this review we will discuss domain-specific constraints on data-driven nanomaterials design, and explore the differences between nanomaterials simulation and nanoinformatics that can be leveraged for greater impact.
Collapse
Affiliation(s)
- A S Barnard
- CSIRO Data61, Docklands, Victoria, Australia.
| | - B Motevalli
- CSIRO Data61, Docklands, Victoria, Australia.
| | - A J Parker
- CSIRO Data61, Docklands, Victoria, Australia.
| | - J M Fischer
- CSIRO Data61, Docklands, Victoria, Australia.
| | - C A Feigl
- CSIRO Data61, Docklands, Victoria, Australia.
| | - G Opletal
- CSIRO Data61, Docklands, Victoria, Australia.
| |
Collapse
|
6
|
Timoshenko J, Frenkel AI. “Inverting” X-ray Absorption Spectra of Catalysts by Machine Learning in Search for Activity Descriptors. ACS Catal 2019. [DOI: 10.1021/acscatal.9b03599] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Janis Timoshenko
- Department of Interface Science, Fritz-Haber-Institute of the Max Planck Society, 14195 Berlin, Germany
| | - Anatoly I. Frenkel
- Department of Materials Science and Chemical Engineering, Stony Brook University, Stony Brook, New York 11794, United States
- Chemistry Division, Brookhaven National Laboratory, Upton, New York 11973, United States
| |
Collapse
|
7
|
Convolutional Neural Networks for Crystal Material
Property Prediction Using Hybrid Orbital-Field
Matrix and Magpie Descriptors. CRYSTALS 2019. [DOI: 10.3390/cryst9040191] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Computational prediction of crystal materials properties can help to do large-scale insiliconscreening. Recent studies of material informatics have focused on expert design of multidimensionalinterpretable material descriptors/features. However, successes of deep learning suchas Convolutional Neural Networks (CNN) in image recognition and speech recognition havedemonstrated their automated feature extraction capability to effectively capture the characteristicsof the data and achieve superior prediction performance. Here, we propose CNN-OFM-Magpie, aCNN model with OFM (Orbital-field Matrix) and Magpie descriptors to predict the formationenergy of 4030 crystal material by exploiting the complementarity of two-dimensional OFM featuresand Magpie features. Experiments showed that our method achieves better performance thanconventional regression algorithms such as support vector machines and Random Forest. It is alsobetter than CNN models using only the OFM features, the Magpie features, or the basic one-hotencodings. This demonstrates the advantages of CNN and feature fusion for materials propertyprediction. Finally, we visualized the two-dimensional OFM descriptors and analyzed the featuresextracted by the CNN to obtain greater understanding of the CNN-OFM model.
Collapse
|
8
|
Hughes ZE, Thacker JCR, Wilson AL, Popelier PLA. Description of Potential Energy Surfaces of Molecules Using FFLUX Machine Learning Models. J Chem Theory Comput 2018; 15:116-126. [DOI: 10.1021/acs.jctc.8b00806] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Zak E. Hughes
- Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, U.K
- School of Chemistry, The University of Manchester, Manchester M13 9PL, U.K
| | - Joseph C. R. Thacker
- Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, U.K
- School of Chemistry, The University of Manchester, Manchester M13 9PL, U.K
| | - Alex L. Wilson
- Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, U.K
- School of Chemistry, The University of Manchester, Manchester M13 9PL, U.K
| | - Paul L. A. Popelier
- Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, U.K
- School of Chemistry, The University of Manchester, Manchester M13 9PL, U.K
| |
Collapse
|
9
|
Yan T, Sun B, Barnard AS. Predicting archetypal nanoparticle shapes using a combination of thermodynamic theory and machine learning. NANOSCALE 2018; 10:21818-21826. [PMID: 30452032 DOI: 10.1039/c8nr07341d] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Machine learning is a useful way of identifying representative or pure nanoparticle shapes as part of a larger ensemble, but its predictive capabilities can be limited when a large dataset of candidate structures must already exist. Ideally one would like to use machine learning to define the ideal dataset for future, more computationally intensive, studies before a significant amount of resources are consumed. In this work we combine an established analytical phenomenological model and statistical machine learning to predict the archetypes and prototypes of a diverse ensemble of 2380 platinum nanoparticle morphologies developed with less than twenty input electronic structure simulations. By parameterising a size- and shape-dependent thermodynamic model, probabilities are assigned to seventeen different shapes between three and thirty nanometres, which together with structural features such as nanoparticle diameter, surface area, sphericity and facet configuration form the basis for archetypal analysis and K-means clustering. Using this approach we rapidly identify six "pure" archetypes and twelve "representative" prototypes that can be used in future computational studies of properties such as catalysis.
Collapse
Affiliation(s)
- Tao Yan
- Molecular and Materials Modelling, Data61 CSIRO, Door 34 Goods Shed, Village St, Docklands, VIC 3008, Australia.
| | | | | |
Collapse
|