1
|
Zhang X, Sheng Y, Liu X, Yang J, Goddard Iii WA, Ye C, Zhang W. Polymer-Unit Graph: Advancing Interpretability in Graph Neural Network Machine Learning for Organic Polymer Semiconductor Materials. J Chem Theory Comput 2024; 20:2908-2920. [PMID: 38551455 DOI: 10.1021/acs.jctc.3c01385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
The graph representation of complex materials plays a crucial role in the field of inorganic and organic materials investigations for developing data-centric materials science, such as those using graph neural networks (GNNs). However, the currently prevalent GNN models are primarily employed for investigating periodic crystals and organic small molecule data, yet they still encounter challenges in terms of interpretability and computational efficiency when applied to polymer monomers and organic macromolecules data. There is still a lack of graph representation of organic polymers and macromolecules specifically tailored for GNN models to explore the structural characteristics. The Polymer-unit Graph, a novel coarse-grained graph representation method introduced in study, is dedicated to expressing and analyzing polymers and macromolecules. By incorporating the Polymer-unit Graph into the GNN models and analyzing the organic semiconductor (OSC) materials database, it becomes possible to uncover intricate structure-property relationships involving branched-chain engineering, fluoridation substitution, and donor-acceptor combination effects on the elementary structure of OSC polymers. Furthermore, the Polymer-unit Graph enables visualizing the relationship between target properties and polymer units while reducing training time by an impressive 98% and minimizing molecular graph representation models. In conclusion, the Polymer-unit Graph successfully integrates the concept of Polymer-unit into the field of GNNs, enabling more accurate analysis and understanding of organic polymers and macromolecules.
Collapse
Affiliation(s)
- Xinyue Zhang
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Ye Sheng
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Xiumin Liu
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
- Key Laboratory of Soft Chemistry and Functional Materials of MOE, School of Chemistry and Chemical Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Jiong Yang
- Materials Genome Institute, Shanghai University, Shanghai 200444, PR China
| | - William A Goddard Iii
- Materials and Process Simulation Center (MSC), California Institute of Technology, Pasadena, California 91125, United States
| | - Caichao Ye
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Wenqing Zhang
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| |
Collapse
|
2
|
Gao P, Zhang Q, Keely D, Cleveland DW, Ye Y, Zheng W, Shen M, Yu H. Molecular Graph-Based Deep Learning Algorithm Facilitates an Imaging-Based Strategy for Rapid Discovery of Small Molecules Modulating Biomolecular Condensates. J Med Chem 2023; 66:15084-15093. [PMID: 37937963 PMCID: PMC10810226 DOI: 10.1021/acs.jmedchem.3c00490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
Biomolecular condensates are proposed to cause diseases, such as cancer and neurodegeneration, by concentrating proteins at abnormal subcellular loci. Imaging-based compound screens have been used to identify small molecules that reverse or promote biomolecular condensates. However, limitations of conventional imaging-based methods restrict the screening scale. Here, we used a graph convolutional network (GCN)-based computational approach and identified small molecule candidates that reduce the nuclear liquid-liquid phase separation of TAR DNA-binding protein 43 (TDP-43), an essential protein that undergoes phase transition in neurodegenerative diseases. We demonstrated that the GCN-based deep learning algorithm is suitable for spatial information extraction from the molecular graph. Thus, this is a promising method to identify small molecule candidates with novel scaffolds. Furthermore, we validated that these candidates do not affect the normal splicing function of TDP-43. Taken together, a combination of an imaging-based screen and a GCN-based deep learning method dramatically improves the speed and accuracy of the compound screen for biomolecular condensates.
Collapse
Affiliation(s)
- Peng Gao
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), MD 20850, USA
| | - Qi Zhang
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), MD 20850, USA
| | - Devin Keely
- Center for Alzheimer’s and Neurodegenerative Diseases, Department of Molecular Biology, Peter O’Donnell Jr. Brain Institute, UT Southwestern Medical Center, TX, 75287, USA
| | - Don W. Cleveland
- Department of Cellular and Molecular Medicine, UC San Diego, CA, 92093, USA
| | - Yihong Ye
- National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), MD 20850, USA
| | - Wei Zheng
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), MD 20850, USA
| | - Min Shen
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), MD 20850, USA
| | - Haiyang Yu
- Center for Alzheimer’s and Neurodegenerative Diseases, Department of Molecular Biology, Peter O’Donnell Jr. Brain Institute, UT Southwestern Medical Center, TX, 75287, USA
| |
Collapse
|
3
|
Ondar EE, Polynski MV, Ananikov VP. Predicting 195 Pt NMR Chemical Shifts in Water-Soluble Inorganic/Organometallic Complexes with a Fast and Simple Protocol Combining Semiempirical Modeling and Machine Learning. Chemphyschem 2023:e202200940. [PMID: 36806426 DOI: 10.1002/cphc.202200940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 02/20/2023] [Accepted: 02/20/2023] [Indexed: 02/23/2023]
Abstract
Water-soluble Pt complexes are the key components in medicinal chemistry and catalysis. The well-known cisplatin family of anticancer drugs and industrial hydrosylilation catalysts are two leading examples. On the molecular level, the activity mechanisms of such complexes mostly involve changes in the Pt coordination sphere. Using 195 Pt NMR spectroscopy for operando monitoring would be a valuable tool for uncovering the activity mechanisms; however, reliable approaches for the rapid correlation of Pt complex structure with 195 Pt chemical shifts are very challenging and not available for everyday research practice. While NMR shielding is a response property, molecular 3D structure determines NMR spectra, as widely known, which allows us to build up 3D structure to 195 Pt chemical shift correlations. Accordingly, we present a new workflow for the determination of lowest-energy configurational/conformational isomers based on the GFN2-xTB semiempirical method and prediction of corresponding chemical shifts with a Machine Learning (ML) model tuned for Pt complexes. The workflow was designed for the prediction of 195 Pt chemical shifts of water-soluble Pt(II) and Pt(IV) anionic, neutral, and cationic complexes with halide, NO2 - , (di)amino, and (di)carboxylate ligands with chemical shift values ranging from -6293 to 7090 ppm. The model offered an accuracy (normalized root-mean-square deviation/RMSD) of 1.08 %/145.02 ppm on the held-out test set.
Collapse
Affiliation(s)
- Evgeniia E Ondar
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospect 47, Moscow, 119991, Russia
| | - Mikhail V Polynski
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospect 47, Moscow, 119991, Russia.,Scientific Technological Center of Organic and Pharmaceutical Chemistry, National Academy of Sciences, 26 Azatutyan Ave, 0014, Yerevan, Armenia
| | - Valentine P Ananikov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospect 47, Moscow, 119991, Russia
| |
Collapse
|
4
|
Materials informatics approach using domain modelling for exploring structure-property relationships of polymers. Sci Rep 2022; 12:10558. [PMID: 35732681 PMCID: PMC9217937 DOI: 10.1038/s41598-022-14394-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 06/06/2022] [Indexed: 11/23/2022] Open
Abstract
In the development of polymer materials, it is an important issue to explore the complex relationships between domain structure and physical properties. In the domain structure analysis of polymer materials, 1H-static solid-state NMR (ssNMR) spectra can provide information on mobile, rigid, and intermediate domains. But estimation of domain structure from its analysis is difficult due to the wide overlap of spectra from multiple domains. Therefore, we have developed a materials informatics approach that combines the domain modeling (http://dmar.riken.jp/matrigica/) and the integrated analysis of meta-information (the elements, functional groups, additives, and physical properties) in polymer materials. Firstly, the 1H-static ssNMR data of 120 polymer materials were subjected to a short-time Fourier transform to obtain frequency, intensity, and T2 relaxation time for domains with different mobility. The average T2 relaxation time of each domain is 0.96 ms for Mobile, 0.55 ms for Intermediate (Mobile), 0.32 ms for Intermediate (Rigid), and 0.11 ms for Rigid. Secondly, the estimated domain proportions were integrated with meta-information such as elements, functional group and thermophysical properties and was analyzed using a self-organization map and market basket analysis. This proposed method can contribute to explore structure–property relationships of polymer materials with multiple domains.
Collapse
|
5
|
Gao P, Xu M, Zhang Q, Chen CZ, Guo H, Ye Y, Zheng W, Shen M. Graph Convolutional Network-Based Screening Strategy for Rapid Identification of SARS-CoV-2 Cell-Entry Inhibitors. J Chem Inf Model 2022; 62:1988-1997. [PMID: 35404596 PMCID: PMC9016773 DOI: 10.1021/acs.jcim.2c00222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Indexed: 11/29/2022]
Abstract
The cell entry of SARS-CoV-2 has emerged as an attractive drug development target. We previously reported that the entry of SARS-CoV-2 depends on the cell surface heparan sulfate proteoglycan (HSPG) and the cortex actin, which can be targeted by therapeutic agents identified by conventional drug repurposing screens. However, this drug identification strategy requires laborious library screening, which is time consuming, and often limited number of compounds can be screened. As an alternative approach, we developed and trained a graph convolutional network (GCN)-based classification model using information extracted from experimentally identified HSPG and actin inhibitors. This method allowed us to virtually screen 170,000 compounds, resulting in ∼2000 potential hits. A hit confirmation assay with the uptake of a fluorescently labeled HSPG cargo further shortlisted 256 active compounds. Among them, 16 compounds had modest to strong inhibitory activities against the entry of SARS-CoV-2 pseudotyped particles into Vero E6 cells. These results establish a GCN-based virtual screen workflow for rapid identification of new small molecule inhibitors against validated drug targets.
Collapse
Affiliation(s)
- Peng Gao
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| | - Miao Xu
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| | - Qi Zhang
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
- National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), Bethesda, Maryland 20892, United States
| | - Catherine Z Chen
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| | - Hui Guo
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| | - Yihong Ye
- National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), Bethesda, Maryland 20892, United States
| | - Wei Zheng
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| | - Min Shen
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| |
Collapse
|
6
|
Accurate predictions of drugs aqueous solubility via deep learning tools. J Mol Struct 2022. [DOI: 10.1016/j.molstruc.2021.131562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|