1
|
He Z, Yang W, Yang F, Zhang J, Ma L. Innovative medicinal chemistry strategies for enhancing drug solubility. Eur J Med Chem 2024; 279:116842. [PMID: 39260319 DOI: 10.1016/j.ejmech.2024.116842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/25/2024] [Accepted: 08/26/2024] [Indexed: 09/13/2024]
Abstract
Drug candidates with poor solubility have been recognized as the cause of many drug development failures, owing to the fact that low solubility is unfavorable for physicochemical, pharmacokinetic (PK) and pharmacodynamic (PD) properties. Given the imperative role of solubility during drug development, we herein summarize various strategies for solubility optimizations from a medicinal chemistry perspective, including introduction of polar group, salt formation, structural simplification, disruption of molecular planarity and symmetry, optimizations on the solvent exposed region as well as prodrug design. In addition, methods for solubility assessment and prediction are reviewed. Besides, we have deeply discussed the strategies for solubility improvement. This paper is expected to be beneficial for the development of drug-like molecules with good solubility.
Collapse
Affiliation(s)
- Zhangxu He
- Pharmacy College, Henan University of Chinese Medicine, 450046, Zhengzhou, China
| | - Weiguang Yang
- Children's Hospital Affiliated of Zhengzhou University, Henan Children's Hospital, Zhengzhou Children's Hospital, Henan, Zhengzhou, 450000, China
| | - Feifei Yang
- Pharmacy College, Henan University of Chinese Medicine, 450046, Zhengzhou, China
| | - Jingyu Zhang
- Pharmacy College, Henan University of Chinese Medicine, 450046, Zhengzhou, China.
| | - Liying Ma
- State Key Laboratory of Esophageal Cancer Prevention and Treatment, Key Laboratory of Advanced Drug Preparation Technologies, Ministry of Education, School of Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, 450001, China; China Meheco Topfond Pharmaceutical Co., Zhumadian, 463000, China.
| |
Collapse
|
2
|
Li C, Luo Y, Xie Y, Zhang Z, Liu Y, Zou L, Xiao F. Structural and functional prediction, evaluation, and validation in the post-sequencing era. Comput Struct Biotechnol J 2024; 23:446-451. [PMID: 38223342 PMCID: PMC10787220 DOI: 10.1016/j.csbj.2023.12.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/16/2024] Open
Abstract
The surge of genome sequencing data has underlined substantial genetic variants of uncertain significance (VUS). The decryption of VUS discovered by sequencing poses a major challenge in the post-sequencing era. Although experimental assays have progressed in classifying VUS, only a tiny fraction of the human genes have been explored experimentally. Thus, it is urgently needed to generate state-of-the-art functional predictors of VUS in silico. Artificial intelligence (AI) is an invaluable tool to assist in the identification of VUS with high efficiency and accuracy. An increasing number of studies indicate that AI has brought an exciting acceleration in the interpretation of VUS, and our group has already used AI to develop protein structure-based prediction models. In this review, we provide an overview of the previous research on AI-based prediction of missense variants, and elucidate the challenges and opportunities for protein structure-based variant prediction in the post-sequencing era.
Collapse
Affiliation(s)
- Chang Li
- Clinical Biobank, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Yixuan Luo
- Beijing Normal University, Beijing, China
| | - Yibo Xie
- Information Center, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Zaifeng Zhang
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Ye Liu
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Lihui Zou
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Fei Xiao
- Clinical Biobank, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- Beijing Normal University, Beijing, China
| |
Collapse
|
3
|
Ahmad W, Chong KT, Tayara H. GGAS2SN: Gated Graph and SmilesToSeq Network for Solubility Prediction. J Chem Inf Model 2024; 64:7833-7843. [PMID: 39387596 DOI: 10.1021/acs.jcim.4c00792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Aqueous solubility is a critical physicochemical property of drug discovery. Solubility is a key issue in pharmaceutical development because it can limit a drug's absorption capacity. Accurate solubility prediction is crucial for pharmacological, environmental, and drug development studies. This research introduces a novel method for solubility prediction by combining gated graph neural networks (GGNNs) and graph attention neural networks (GATs) with Smiles2Seq encoding. Our methodology involves converting chemical compounds into graph structures with nodes representing atoms and edges indicating chemical bonds. These graphs are then processed by using a specialized graph neural network (GNN) architecture. Incorporating attention mechanisms into GNN allows for capturing subtle structural dependencies, fostering improved solubility predictions. Furthermore, we utilized the Smiles2Seq encoding technique to bridge the semantic gap between molecular structures and their textual representations. Smiles2Seq seamlessly converts chemical notations into numeric sequences, facilitating the efficient transfer of information into our model. We demonstrate the efficacy of our approach through comprehensive experiments on benchmark solubility data sets, showcasing superior predictive performance compared to traditional methods. Our model outperforms existing solubility prediction models and provides interpretable insights into the molecular features driving solubility behavior. This research signifies an important advancement in solubility prediction, offering potent tools for drug discovery, formulation development, and environmental assessments. The fusion of GGNN and Smiles2Seq encoding establishes a robust framework for accurately forecasting solubility across various chemical compounds, fostering innovation in various domains reliant on solubility data.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Korea
| |
Collapse
|
4
|
Liu H, Chen P, Zhang C, Huang X. Interpretable and Physicochemical-Intuitive Deep Learning Approach for the Design of Thermal Resistance of Energetic Compounds. J Phys Chem A 2024; 128:9045-9054. [PMID: 39380131 DOI: 10.1021/acs.jpca.4c04849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2024]
Abstract
Thermal resistance of energetic materials is critical due to its impact on safety and sustainability. However, developing predictive models remains challenging because of data scarcity and limited insights into quantitative structure-property relationships. In this work, a deep learning framework, named EM-thermo, was proposed to address these challenges. A data set comprising 5029 CHNO compounds, including 976 energetic compounds, was constructed to facilitate this study. EM-thermo employs molecular graphs and direct message-passing neural networks to capture structural features and predict thermal resistance. Using transfer learning, the model achieves an accuracy of approximately 97% for predicting the thermal-resistance property (decomposition temperatures above 573.15 K) in energetic compounds. The involvement of molecular descriptors improved model prediction. These findings suggest that EM-thermo is effective for correlating thermal resistance from the atom and covalent bond level, offering a promising tool for advancing molecular design and discovery in the field of energetic compounds.
Collapse
Affiliation(s)
- Haitao Liu
- Institute of Chemical Materials, China Academy of Engineering Physics (CAEP), Mianyang 621900, PR China
- School of National Defense & Nuclear Science and Technology, Southwest University of Science and Technology, Mianyang 621010, PR China
| | - Peng Chen
- Institute of Chemical Materials, China Academy of Engineering Physics (CAEP), Mianyang 621900, PR China
- School of National Defense & Nuclear Science and Technology, Southwest University of Science and Technology, Mianyang 621010, PR China
| | - Chaoyang Zhang
- Institute of Chemical Materials, China Academy of Engineering Physics (CAEP), Mianyang 621900, PR China
- Beijing Computational Science Research Center, Beijing 100193, PR China
| | - Xin Huang
- Institute of Chemical Materials, China Academy of Engineering Physics (CAEP), Mianyang 621900, PR China
| |
Collapse
|
5
|
Rittig JG, Mitsos A. Thermodynamics-consistent graph neural networks. Chem Sci 2024:d4sc04554h. [PMID: 39430937 PMCID: PMC11485056 DOI: 10.1039/d4sc04554h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 10/07/2024] [Indexed: 10/22/2024] Open
Abstract
We propose excess Gibbs free energy graph neural networks (GE-GNNs) for predicting composition-dependent activity coefficients of binary mixtures. The GE-GNN architecture ensures thermodynamic consistency by predicting the molar excess Gibbs free energy and using thermodynamic relations to obtain activity coefficients. As these are differential, automatic differentiation is applied to learn the activity coefficients in an end-to-end manner. Since the architecture is based on fundamental thermodynamics, we do not require additional loss terms to learn thermodynamic consistency. As the output is a fundamental property, we neither impose thermodynamic modeling limitations and assumptions. We demonstrate high accuracy and thermodynamic consistency of the activity coefficient predictions.
Collapse
Affiliation(s)
- Jan G Rittig
- Process Systems Engineering (AVT.SVT), RWTH Aachen University Forckenbeckstraße 51 52074 Aachen Germany
| | - Alexander Mitsos
- Process Systems Engineering (AVT.SVT), RWTH Aachen University Forckenbeckstraße 51 52074 Aachen Germany
- JARA-ENERGY Templergraben 55 52056 Aachen Germany
- Institute of Climate and Energy Systems ICE-1: Energy Systems Engineering, Forschungszentrum Jülich GmbH Wilhelm-Johnen-Straße 52425 Jülich Germany
| |
Collapse
|
6
|
Spiekermann KA, Dong X, Menon A, Green WH, Pfeifle M, Sandfort F, Welz O, Bergeler M. Accurately Predicting Barrier Heights for Radical Reactions in Solution Using Deep Graph Networks. J Phys Chem A 2024; 128:8384-8403. [PMID: 39298746 DOI: 10.1021/acs.jpca.4c04121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Quantitative estimates of reaction barriers and solvent effects are essential for developing kinetic mechanisms and predicting reaction outcomes. Here, we create a new data set of 5,600 unique elementary radical reactions calculated using the M06-2X/def2-QZVP//B3LYP-D3(BJ)/def2-TZVP level of theory. A conformer search is done for each species using TPSS/def2-TZVP. Gibbs free energies of activation and of reaction for these radical reactions in 40 common solvents are obtained using COSMO-RS for solvation effects. These balanced reactions involve the elements H, C, N, O, and S, contain up to 19 heavy atoms, and have atom-mapped SMILES. All transition states are verified by an intrinsic reaction coordinate calculation. We next train a deep graph network to directly estimate the Gibbs free energy of activation and of reaction in both gas and solution phases using only the atom-mapped SMILES of the reactant and product and the SMILES of the solvent. This simple input representation avoids computationally expensive optimizations for the reactant, transition state, and product structures during inference, making our model well-suited for high-throughput predictive chemistry and quickly providing information for (retro-)synthesis planning tools. To properly measure model performance, we report results on both interpolative and extrapolative data splits and also compare to several baseline models. During training and testing, the data set is augmented by including the reverse direction of each reaction and variants with different resonance structures. After data augmentation, we have around 2 million entries to train the model, which achieves a testing set mean absolute error of 1.16 kcal mol-1 for the Gibbs free energy of activation in solution. We anticipate this model will accelerate predictions for high-throughput screening to quickly identify relevant reactions in solution, and our data set will serve as a benchmark for future studies.
Collapse
Affiliation(s)
- Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Xiaorui Dong
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Mark Pfeifle
- BASF Digital Solutions GmbH, Ludwigshafen am Rhein 67061, Germany
| | - Frederik Sandfort
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Oliver Welz
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Maike Bergeler
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| |
Collapse
|
7
|
Bharadwaj S, Deepika K, Kumar A, Jaiswal S, Miglani S, Singh D, Fartyal P, Kumar R, Singh S, Singh MP, Gaidhane AM, Kumar B, Jha V. Exploring the Artificial Intelligence and Its Impact in Pharmaceutical Sciences: Insights Toward the Horizons Where Technology Meets Tradition. Chem Biol Drug Des 2024; 104:e14639. [PMID: 39396920 DOI: 10.1111/cbdd.14639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Revised: 09/03/2024] [Accepted: 09/24/2024] [Indexed: 10/15/2024]
Abstract
The technological revolutions in computers and the advancement of high-throughput screening technologies have driven the application of artificial intelligence (AI) for faster discovery of drug molecules with more efficiency, and cost-friendly finding of hit or lead molecules. The ability of software and network frameworks to interpret molecular structures' representations and establish relationships/correlations has enabled various research teams to develop numerous AI platforms for identifying new lead molecules or discovering new targets for already established drug molecules. The prediction of biological activity, ADME properties, and toxicity parameters in early stages have reduced the chances of failure and associated costs in later clinical stages, which was observed at a high rate in the tedious, expensive, and laborious drug discovery process. This review focuses on the different AI and machine learning (ML) techniques with their applications mainly focused on the pharmaceutical industry. The applications of AI frameworks in the identification of molecular target, hit identification/hit-to-lead optimization, analyzing drug-receptor interactions, drug repurposing, polypharmacology, synthetic accessibility, clinical trial design, and pharmaceutical developments are discussed in detail. We have also compiled the details of various startups in AI in this field. This review will provide a comprehensive analysis and outline various state-of-the-art AI/ML techniques to the readers with their framework applications. This review also highlights the challenges in this field, which need to be addressed for further success in pharmaceutical applications.
Collapse
Affiliation(s)
- Shruti Bharadwaj
- Center for SeNSE, Indian Institute of Technology Delhi (IIT), New Delhi, India
| | - Kumari Deepika
- Department of Computer Engineering, Pune Institute of Computer Technology, Pune, India
| | - Asim Kumar
- Amity Institute of Pharmacy (AIP), Amity University Haryana, Manesar, India
| | - Shivani Jaiswal
- Institute of Pharmaceutical Research, GLA University, Mathura, India
| | - Shaweta Miglani
- Department of Education, Central University of Punjab, Bathinda, India
| | - Damini Singh
- IES Institute of Pharmacy, IES University, Bhopal, Madhya Pradesh, India
| | - Prachi Fartyal
- Department of Mathematics, Govt PG College Bajpur (US Nagar), Bazpur, Uttarakhand, India
| | - Roshan Kumar
- Department of Microbiology, Graphic Era (Deemed to be University), Dehradun, India
- Department of Microbiology, Central University of Punjab, VPO-Ghudda, Punjab, India
| | - Shareen Singh
- Centre for Research Impact & Outcome, Chitkara College of Pharmacy, Chitkara University, Rajpura, Punjab, India
| | - Mahendra Pratap Singh
- Center for Global Health Research, Saveetha Medical College and Hospital, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, India
| | - Abhay M Gaidhane
- Jawaharlal Nehru Medical College, and Global Health Academy, School of Epidemiology and Public Health, Datta Meghe Institute of Higher Education, Wardha, India
| | - Bhupinder Kumar
- Department of Pharmaceutical Science, Hemvati Nandan Bahuguna Garhwal (A Central) University, Srinagar, Uttarakhand, India
| | - Vibhu Jha
- Institute of Cancer Therapeutics, School of Pharmacy and Medical Sciences, Faculty of Life Sciences, University of Bradford, Bradford, UK
| |
Collapse
|
8
|
Feldmann CW, Sieg J, Mathea M. Analysis of uncertainty of neural fingerprint-based models. Faraday Discuss 2024. [PMID: 39320108 DOI: 10.1039/d4fd00095a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
Machine learning has gained popularity for predicting molecular properties based on molecular structure. This study explores the uncertainty estimates of neural fingerprint-based models by comparing pure graph neural networks (GNN) to classical machine learning algorithms combined with neural fingerprints. We investigate the advantage of extracting the neural fingerprint from the GNN and integrating it into a method known for producing better-calibrated probability estimates. Comparisons are made using three classical machine learning methods and the Chemprop model, considering different molecular representations and calibration techniques. We utilize 19 datasets from Toxcast, reflecting real-world scenarios with balanced accuracies ranging from 0.6 to 0.8. Results demonstrate that neural fingerprints combined with classical machine learning methods exhibit a slight decrease in prediction performance compared to the native Chemprop model. However, these models provide significantly improved uncertainty estimates. Notably, uncertainty estimates of neural fingerprint-based methods remain relatively robust for molecules dissimilar to the training set. This suggests that methods like random forest with neural fingerprints can deliver strong prediction performance and reliable uncertainty estimates. When considering both performance and uncertainty, the calibrated Chemprop model and the combination of neural fingerprints with random forest or support vector classifier (SVC) yield comparable results. Surprisingly, the SVC method shows promising performance when combined with neural or count fingerprints. These findings are particularly relevant in real-world industrial projects where accurate predictions and reliable uncertainty estimates are crucial.
Collapse
|
9
|
Beccaria R, Lazzeri A, Tiana G. Predicting the Binding of Small Molecules to Proteins through Invariant Representation of the Molecular Structure. J Chem Inf Model 2024; 64:6758-6767. [PMID: 39197011 DOI: 10.1021/acs.jcim.4c00752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2024]
Abstract
We present a computational scheme for predicting the ligands that bind to a pocket of a known structure. It is based on the generation of a general abstract representation of the molecules, which is invariant to rotations, translations, and permutations of atoms, and has some degree of isometry with the space of conformations. We use these representations to train a nondeep machine learning algorithm to classify the binding between pockets and molecule pairs and show that this approach has a better generalization capability than existing methods.
Collapse
Affiliation(s)
- R Beccaria
- Department of Physics, University of Milano, via Celoria 16, 20133 Milano, Italy
- Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
- Faculty of Physics, Heidelberg University, Im Neuenheimer Feld 227, 69120 Heidelberg, Germany
| | - A Lazzeri
- Department of Physics, University of Milano, via Celoria 16, 20133 Milano, Italy
| | - G Tiana
- Department of Physics, University of Milano, via Celoria 16, 20133 Milano, Italy
- INFN, via Celoria 16, 20133 Milano, Italy
| |
Collapse
|
10
|
Risheh A, Rebel A, Nerenberg PS, Forouzesh N. Calculation of protein-ligand binding entropies using a rule-based molecular fingerprint. Biophys J 2024; 123:2839-2848. [PMID: 38481102 PMCID: PMC11393669 DOI: 10.1016/j.bpj.2024.03.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 12/21/2023] [Accepted: 03/08/2024] [Indexed: 03/28/2024] Open
Abstract
The use of fast in silico prediction methods for protein-ligand binding free energies holds significant promise for the initial phases of drug development. Numerous traditional physics-based models (e.g., implicit solvent models), however, tend to either neglect or heavily approximate entropic contributions to binding due to their computational complexity. Consequently, such methods often yield imprecise assessments of binding strength. Machine learning models provide accurate predictions and can often outperform physics-based models. They, however, are often prone to overfitting, and the interpretation of their results can be difficult. Physics-guided machine learning models combine the consistency of physics-based models with the accuracy of modern data-driven algorithms. This work integrates physics-based model conformational entropies into a graph convolutional network. We introduce a new neural network architecture (a rule-based graph convolutional network) that generates molecular fingerprints according to predefined rules specifically optimized for binding free energy calculations. Our results on 100 small host-guest systems demonstrate significant improvements in convergence and preventing overfitting. We additionally demonstrate the transferability of our proposed hybrid model by training it on the aforementioned host-guest systems and then testing it on six unrelated protein-ligand systems. Our new model shows little difference in training set accuracy compared to a previous model but an order-of-magnitude improvement in test set accuracy. Finally, we show how the results of our hybrid model can be interpreted in a straightforward fashion.
Collapse
Affiliation(s)
- Ali Risheh
- Department of Computer Science, California State University, Los Angeles, California
| | - Alles Rebel
- Department of Computer Science, California State University, Los Angeles, California
| | - Paul S Nerenberg
- Kravis Department of Integrated Sciences, Claremont McKenna College, Claremont, California
| | - Negin Forouzesh
- Department of Computer Science, California State University, Los Angeles, California.
| |
Collapse
|
11
|
Meza-González B, Ramírez-Palma DI, Carpio-Martínez P, Vázquez-Cuevas D, Martínez-Mayorga K, Cortés-Guzmán F. Quantum Topological Atomic Properties of 44K molecules. Sci Data 2024; 11:945. [PMID: 39209874 PMCID: PMC11362522 DOI: 10.1038/s41597-024-03723-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
We present a data set of quantum topological properties of atoms of 44K randomly selected molecules from the GDB-9 data set. These atomic properties were obtained as defined by the quantum theory of atoms in molecules (QTAIM) within an atomic basin, a region of real space bounded by zero-flux surfaces in the electron density gradient vector field. The wave function files were generated through DFT static calculations (B3LYP/6-31G), and the atomic properties were calculated using QTAIM. The calculated atomic properties include the energy of the atomic basin, the electronic population, the magnitude of the total dipole moment, and the magnitude of the total quadrupole moment. The atomic properties allow one to understand the chemical structure, reactivity, and molecular recognition. They can be incorporated into force fields for molecular dynamics or for predicting reactive sites. We believe that this data set could facilitate new studies in chemical informatics, machine learning applied to chemistry, and computational molecular design.
Collapse
Affiliation(s)
- Brandon Meza-González
- Facultad de Química, Universidad Nacional Autónoma de México, Ciudad de Méxinclude thexico, Mexico City, Mexico
| | - David I Ramírez-Palma
- Instituto de Química, Unidad Mérida, Universidad Nacional Autónoma de México, Mérida, Yucatán, Mexico
| | - Pablo Carpio-Martínez
- Centro Conjunto de Investigación en Química Sustentable UAEM-UNAM, Carretera Toluca-Atlacomulco, km. 14.5, Toluca, Estado de México, C.P. 50200, Mexico
| | - David Vázquez-Cuevas
- Instituto de Química, Unidad Mérida, Universidad Nacional Autónoma de México, Mérida, Yucatán, Mexico
| | - Karina Martínez-Mayorga
- Instituto de Química, Unidad Mérida, Universidad Nacional Autónoma de México, Mérida, Yucatán, Mexico
| | - Fernando Cortés-Guzmán
- Facultad de Química, Universidad Nacional Autónoma de México, Ciudad de Méxinclude thexico, Mexico City, Mexico.
| |
Collapse
|
12
|
Zhang WY, Zheng XL, Coghi PS, Chen JH, Dong BJ, Fan XX. Revolutionizing adjuvant development: harnessing AI for next-generation cancer vaccines. Front Immunol 2024; 15:1438030. [PMID: 39206192 PMCID: PMC11349682 DOI: 10.3389/fimmu.2024.1438030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
With the COVID-19 pandemic, the importance of vaccines has been widely recognized and has led to increased research and development efforts. Vaccines also play a crucial role in cancer treatment by activating the immune system to target and destroy cancer cells. However, enhancing the efficacy of cancer vaccines remains a challenge. Adjuvants, which enhance the immune response to antigens and improve vaccine effectiveness, have faced limitations in recent years, resulting in few novel adjuvants being identified. The advancement of artificial intelligence (AI) technology in drug development has provided a foundation for adjuvant screening and application, leading to a diversification of adjuvants. This article reviews the significant role of tumor vaccines in basic research and clinical treatment and explores the use of AI technology to screen novel adjuvants from databases. The findings of this review offer valuable insights for the development of new adjuvants for next-generation vaccines.
Collapse
Affiliation(s)
- Wan-Ying Zhang
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, Macao SAR, China
| | - Xiao-Li Zheng
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, Macao SAR, China
| | - Paolo Saul Coghi
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, Macao SAR, China
| | - Jun-Hui Chen
- Intervention and Cell Therapy Center, Peking University Shenzhen Hospital, Shenzhen, China
| | - Bing-Jun Dong
- Gynecology Department, Zhuhai Hospital of Integrated Traditional Chinese and Western Medicine, Zhuhai, China
| | - Xing-Xing Fan
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, Macao SAR, China
| |
Collapse
|
13
|
Aksamit N, Tchagang A, Li Y, Ombuki-Berman B. Hybrid fragment-SMILES tokenization for ADMET prediction in drug discovery. BMC Bioinformatics 2024; 25:255. [PMID: 39090573 PMCID: PMC11295479 DOI: 10.1186/s12859-024-05861-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 07/10/2024] [Indexed: 08/04/2024] Open
Abstract
BACKGROUND Drug discovery and development is the extremely costly and time-consuming process of identifying new molecules that can interact with a biomarker target to interrupt the disease pathway of interest. In addition to binding the target, a drug candidate needs to satisfy multiple properties affecting absorption, distribution, metabolism, excretion, and toxicity (ADMET). Artificial intelligence approaches provide an opportunity to improve each step of the drug discovery and development process, in which the first question faced by us is how a molecule can be informatively represented such that the in-silico solutions are optimized. RESULTS This study introduces a novel hybrid SMILES-fragment tokenization method, coupled with two pre-training strategies, utilizing a Transformer-based model. We investigate the efficacy of hybrid tokenization in improving the performance of ADMET prediction tasks. Our approach leverages MTL-BERT, an encoder-only Transformer model that achieves state-of-the-art ADMET predictions, and contrasts the standard SMILES tokenization with our hybrid method across a spectrum of fragment library cutoffs. CONCLUSION The findings reveal that while an excess of fragments can impede performance, using hybrid tokenization with high frequency fragments enhances results beyond the base SMILES tokenization. This advancement underscores the potential of integrating fragment- and character-level molecular features within the training of Transformer models for ADMET property prediction.
Collapse
Affiliation(s)
- Nicholas Aksamit
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada
| | - Alain Tchagang
- Digital Technologies Research Centre, National Research Council Canada, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada
| | - Yifeng Li
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
- Department of Biological Sciences, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
| | - Beatrice Ombuki-Berman
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
| |
Collapse
|
14
|
Li X, Zhao X, Yu X, Zhao J, Fang X. Construction of a multi-tissue compound-target interaction network of Qingfei Paidu decoction in COVID-19 treatment based on deep learning and transcriptomic analysis. J Bioinform Comput Biol 2024; 22:2450016. [PMID: 39036847 DOI: 10.1142/s0219720024500161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
The Qingfei Paidu decoction (QFPDD) is a widely acclaimed therapeutic formula employed nationwide for the clinical management of coronavirus disease 2019 (COVID-19). QFPDD exerts a synergistic therapeutic effect, characterized by its multi-component, multi-target, and multi-pathway action. However, the intricate interactions among the ingredients and targets within QFPDD and their systematic effects in multiple tissues remain undetermined. To address this, we qualitatively characterized the chemical components of QFPDD. We integrated multi-tissue transcriptomic analysis with GraphDTA, a deep learning model, to screen for potential compound-target interactions of QFPDD in multiple tissues. We predicted 13 key active compounds, 127 potential targets and 27 pathways associated with QFPDD across six different tissues. Notably, oleanolic acid-AXL exhibited leading affinity in the heart, blood, and liver. Molecular docking and molecular dynamics simulation confirmed their strong binding affinity. The robust interaction between oleanolic acid and the AXL receptor suggests that AXL is a promising target for developing clinical intervention strategies. Through the construction of a multi-tissue compound-target interaction network, our study further elucidated the mechanisms through which QFPDD effectively combats COVID-19 in multiple tissues. Our work also establishes a framework for future investigations into the systemic effects of other Traditional Chinese Medicine (TCM) formulas in disease treatment.
Collapse
Affiliation(s)
- Xia Li
- Third Clinical College, Shanxi Provincial Integrated TCM and WM Hospital, Shanxi University of Chinese Medicine, Jinzhong, Shanxi, P. R. China
| | - Xuetong Zhao
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, P. R. China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Xinjian Yu
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jianping Zhao
- Third Clinical College, Shanxi Provincial Integrated TCM and WM Hospital, Shanxi University of Chinese Medicine, Jinzhong, Shanxi, P. R. China
| | - Xiangdong Fang
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| |
Collapse
|
15
|
Li J, Yanagisawa K, Akiyama Y. CycPeptMP: enhancing membrane permeability prediction of cyclic peptides with multi-level molecular features and data augmentation. Brief Bioinform 2024; 25:bbae417. [PMID: 39210505 PMCID: PMC11361855 DOI: 10.1093/bib/bbae417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 07/23/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024] Open
Abstract
Cyclic peptides are versatile therapeutic agents that boast high binding affinity, minimal toxicity, and the potential to engage challenging protein targets. However, the pharmaceutical utility of cyclic peptides is limited by their low membrane permeability-an essential indicator of oral bioavailability and intracellular targeting. Current machine learning-based models of cyclic peptide permeability show variable performance owing to the limitations of experimental data. Furthermore, these methods use features derived from the whole molecule that have traditionally been used to predict small molecules and ignore the unique structural properties of cyclic peptides. This study presents CycPeptMP: an accurate and efficient method to predict cyclic peptide membrane permeability. We designed features for cyclic peptides at the atom-, monomer-, and peptide-levels and seamlessly integrated these into a fusion model using deep learning technology. Additionally, we applied various data augmentation techniques to enhance model training efficiency using the latest data. The fusion model exhibited excellent prediction performance for the logarithm of permeability, with a mean absolute error of $0.355$ and correlation coefficient of $0.883$. Ablation studies demonstrated that all feature levels contributed and were relatively essential to predicting membrane permeability, confirming the effectiveness of augmentation to improve prediction accuracy. A comparison with a molecular dynamics-based method showed that CycPeptMP accurately predicted peptide permeability, which is otherwise difficult to predict using simulations.
Collapse
Affiliation(s)
- Jianan Li
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 1528550, Japan
| | - Keisuke Yanagisawa
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 1528550, Japan
- Middle-Molecule ITbased Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Tokyo 1528550, Japan
| | - Yutaka Akiyama
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 1528550, Japan
- Middle-Molecule ITbased Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Tokyo 1528550, Japan
| |
Collapse
|
16
|
Ramani V, Karmakar T. Graph Neural Networks for Predicting Solubility in Diverse Solvents Using MolMerger Incorporating Solute-Solvent Interactions. J Chem Theory Comput 2024. [PMID: 39041858 DOI: 10.1021/acs.jctc.4c00382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
The prediction of solubility is a complex and challenging physicochemical problem that has tremendous implications for the chemical and pharmaceutical industry. Recent advancements in machine learning methods have provided a great scope for predicting the reliable solubility of a large number of molecular systems. However, most of these methods rely on using physical properties obtained from experiments and expensive quantum chemical calculations. Here, we developed a method that utilizes a graphical representation of solute-solvent interactions using "MolMerger," which captures the strongest polar interactions between molecules using Gasteiger charges and creates a graph incorporating the true nature of the system. Using these graphs as input, a neural network learns the correlation between the structural properties of a molecule in the form of node embedding and its physicochemical properties as the output. This approach has been used to calculate molecular solubility by predicting the Log solubility values of various organic molecules and pharmaceuticals in diverse sets of solvents.
Collapse
Affiliation(s)
- Vansh Ramani
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India
| | - Tarak Karmakar
- Department of Chemistry, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India
| |
Collapse
|
17
|
Catacutan DB, Alexander J, Arnold A, Stokes JM. Machine learning in preclinical drug discovery. Nat Chem Biol 2024:10.1038/s41589-024-01679-1. [PMID: 39030362 DOI: 10.1038/s41589-024-01679-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 06/13/2024] [Indexed: 07/21/2024]
Abstract
Drug-discovery and drug-development endeavors are laborious, costly and time consuming. These programs can take upward of 12 years and cost US $2.5 billion, with a failure rate of more than 90%. Machine learning (ML) presents an opportunity to improve the drug-discovery process. Indeed, with the growing abundance of public and private large-scale biological and chemical datasets, ML techniques are becoming well positioned as useful tools that can augment the traditional drug-development process. In this Perspective, we discuss the integration of algorithmic methods throughout the preclinical phases of drug discovery. Specifically, we highlight an array of ML-based efforts, across diverse disease areas, to accelerate initial hit discovery, mechanism-of-action (MOA) elucidation and chemical property optimization. With advances in the application of ML across diverse therapeutic areas, we posit that fully ML-integrated drug-discovery pipelines will define the future of drug-development programs.
Collapse
Affiliation(s)
- Denise B Catacutan
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jeremie Alexander
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Autumn Arnold
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jonathan M Stokes
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada.
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada.
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada.
| |
Collapse
|
18
|
Liu Y(L, Moretti R, Wang Y, Dong H, Yan B, Bodenheimer B, Derr T, Meiler J. Advancements in Ligand-Based Virtual Screening through the Synergistic Integration of Graph Neural Networks and Expert-Crafted Descriptors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.17.537185. [PMID: 37131837 PMCID: PMC10153143 DOI: 10.1101/2023.04.17.537185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The fusion of traditional chemical descriptors with Graph Neural Networks (GNNs) offers a compelling strategy for enhancing ligand-based virtual screening methodologies. A comprehensive evaluation revealed that the benefits derived from this integrative strategy vary significantly among different GNNs. Specifically, while GCN and SchNet demonstrate pronounced improvements by incorporating descriptors, SphereNet exhibits only marginal enhancement. Intriguingly, despite SphereNet's modest gain, all three models-GCN, SchNet, and SphereNet-achieve comparable performance levels when leveraging this combination strategy. This observation underscores a pivotal insight: sophisticated GNN architectures may be substituted with simpler counterparts without sacrificing efficacy, provided that they are augmented with descriptors. Furthermore, our analysis reveals a set of expert-crafted descriptors' robustness in scaffold-split scenarios, frequently outperforming the combined GNN-descriptor models. Given the critical importance of scaffold splitting in accurately mimicking real-world drug discovery contexts, this finding accentuates an imperative for GNN researchers to innovate models that can adeptly navigate and predict within such frameworks. Our work not only validates the potential of integrating descriptors with GNNs in advancing ligand-based virtual screening but also illuminates pathways for future enhancements in model development and application. Our implementation can be found at https://github.com/meilerlab/gnn-descriptor.
Collapse
Affiliation(s)
- Yunchao (Lance) Liu
- Department of Computer Science, Vanderbilt University, 2201 West End Ave Nashville, Tennessee 37235, USA
| | - Rocco Moretti
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, 2201 West End Ave Nashville, Tennessee 37235, USA
| | - Yu Wang
- Department of Computer Science, Vanderbilt University, 2201 West End Ave Nashville, Tennessee 37235, USA
| | - Ha Dong
- Department of Neural Science, Amherst College, 220 South Pleasant Street Amherst, Massachusetts 01002, USA
| | - Bailu Yan
- Department of Biostatistics, Vanderbilt University, 2201 West End Ave Nashville, Tennessee 37235, USA
| | - Bobby Bodenheimer
- Department of Computer Science, Electrical Engineering and Computer Engineering, Vanderbilt University, 2201 West End Ave Nashville, Tennessee 37235, USA
| | - Tyler Derr
- Department of Computer Science, Data Science Institute, Data Science Institute, Vanderbilt University, 2201 West End Ave Nashville, Tennessee 37235, USA
| | - Jens Meiler
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, 2201 West End Ave Nashville, Tennessee 37235, USA, Institute of Drug Discovery, Leipzig University Medical School, Härtelstraße 16-18, Leipzig, 04103, Germany, Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Humboldtstraße 25, Leipzig, 04105, Germany
| |
Collapse
|
19
|
Carnino JM, Pellegrini WR, Willis M, Cohen MB, Paz-Lansberg M, Davis EM, Grillone GA, Levi JR. Assessing ChatGPT's Responses to Otolaryngology Patient Questions. Ann Otol Rhinol Laryngol 2024; 133:658-664. [PMID: 38676440 DOI: 10.1177/00034894241249621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
OBJECTIVE This study aims to evaluate ChatGPT's performance in addressing real-world otolaryngology patient questions, focusing on accuracy, comprehensiveness, and patient safety, to assess its suitability for integration into healthcare. METHODS A cross-sectional study was conducted using patient questions from the public online forum Reddit's r/AskDocs, where medical advice is sought from healthcare professionals. Patient questions were input into ChatGPT (GPT-3.5), and responses were reviewed by 5 board-certified otolaryngologists. The evaluation criteria included difficulty, accuracy, comprehensiveness, and bedside manner/empathy. Statistical analysis explored the relationship between patient question characteristics and ChatGPT response scores. Potentially dangerous responses were also identified. RESULTS Patient questions averaged 224.93 words, while ChatGPT responses were longer at 414.93 words. The accuracy scores for ChatGPT responses were 3.76/5, comprehensiveness scores were 3.59/5, and bedside manner/empathy scores were 4.28/5. Longer patient questions did not correlate with higher response ratings. However, longer ChatGPT responses scored higher in bedside manner/empathy. Higher question difficulty correlated with lower comprehensiveness. Five responses were flagged as potentially dangerous. CONCLUSION While ChatGPT exhibits promise in addressing otolaryngology patient questions, this study demonstrates its limitations, particularly in accuracy and comprehensiveness. The identification of potentially dangerous responses underscores the need for a cautious approach to AI in medical advice. Responsible integration of AI into healthcare necessitates thorough assessments of model performance and ethical considerations for patient safety.
Collapse
Affiliation(s)
- Jonathan M Carnino
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - William R Pellegrini
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Megan Willis
- Department of Biostatistics, Boston University, Boston, MA, USA
| | - Michael B Cohen
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Marianella Paz-Lansberg
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Elizabeth M Davis
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Gregory A Grillone
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Jessica R Levi
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| |
Collapse
|
20
|
Tang X, Tran A, Tan J, Gerstein MB. MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations. Bioinformatics 2024; 40:i357-i368. [PMID: 38940177 PMCID: PMC11256921 DOI: 10.1093/bioinformatics/btae260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION The current paradigm of deep learning models for the joint representation of molecules and text primarily relies on 1D or 2D molecular formats, neglecting significant 3D structural information that offers valuable physical insight. This narrow focus inhibits the models' versatility and adaptability across a wide range of modalities. Conversely, the limited research focusing on explicit 3D representation tends to overlook textual data within the biomedical domain. RESULTS We present a unified pre-trained language model, MolLM, that concurrently captures 2D and 3D molecular information alongside biomedical text. MolLM consists of a text Transformer encoder and a molecular Transformer encoder, designed to encode both 2D and 3D molecular structures. To support MolLM's self-supervised pre-training, we constructed 160K molecule-text pairings. Employing contrastive learning as a supervisory signal for learning, MolLM demonstrates robust molecular representation capabilities across four downstream tasks, including cross-modal molecule and text matching, property prediction, captioning, and text-prompted molecular editing. Through ablation, we demonstrate that the inclusion of explicit 3D representations improves performance in these downstream tasks. AVAILABILITY AND IMPLEMENTATION Our code, data, pre-trained model weights, and examples of using our model are all available at https://github.com/gersteinlab/MolLM. In particular, we provide Jupyter Notebooks offering step-by-step guidance on how to use MolLM to extract embeddings for both molecules and text.
Collapse
Affiliation(s)
- Xiangru Tang
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Andrew Tran
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Jeffrey Tan
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Mark B Gerstein
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, United States
- Department of Statistics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
| |
Collapse
|
21
|
Xiang W, Zhong F, Ni L, Zheng M, Li X, Shi Q, Wang D. Gram matrix: an efficient representation of molecular conformation and learning objective for molecular pretraining. Brief Bioinform 2024; 25:bbae340. [PMID: 38990515 PMCID: PMC11238115 DOI: 10.1093/bib/bbae340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 06/05/2024] [Accepted: 06/28/2024] [Indexed: 07/12/2024] Open
Abstract
Accurate prediction of molecular properties is fundamental in drug discovery and development, providing crucial guidance for effective drug design. A critical factor in achieving accurate molecular property prediction lies in the appropriate representation of molecular structures. Presently, prevalent deep learning-based molecular representations rely on 2D structure information as the primary molecular representation, often overlooking essential three-dimensional (3D) conformational information due to the inherent limitations of 2D structures in conveying atomic spatial relationships. In this study, we propose employing the Gram matrix as a condensed representation of 3D molecular structures and for efficient pretraining objectives. Subsequently, we leverage this matrix to construct a novel molecular representation model, Pre-GTM, which inherently encapsulates 3D information. The model accurately predicts the 3D structure of a molecule by estimating the Gram matrix. Our findings demonstrate that Pre-GTM model outperforms the baseline Graphormer model and other pretrained models in the QM9 and MoleculeNet quantitative property prediction task. The integration of the Gram matrix as a condensed representation of 3D molecular structure, incorporated into the Pre-GTM model, opens up promising avenues for its potential application across various domains of molecular research, including drug design, materials science, and chemical engineering.
Collapse
Affiliation(s)
| | - Feisheng Zhong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China
| | - Lin Ni
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Qian Shi
- Lingang Laboratory, Shanghai 200031, China
| | | |
Collapse
|
22
|
Ju W, Fang Z, Gu Y, Liu Z, Long Q, Qiao Z, Qin Y, Shen J, Sun F, Xiao Z, Yang J, Yuan J, Zhao Y, Wang Y, Luo X, Zhang M. A Comprehensive Survey on Deep Graph Representation Learning. Neural Netw 2024; 173:106207. [PMID: 38442651 DOI: 10.1016/j.neunet.2024.106207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 01/23/2024] [Accepted: 02/21/2024] [Indexed: 03/07/2024]
Abstract
Graph representation learning aims to effectively encode high-dimensional sparse graph-structured data into low-dimensional dense vectors, which is a fundamental task that has been widely studied in a range of fields, including machine learning and data mining. Classic graph embedding methods follow the basic idea that the embedding vectors of interconnected nodes in the graph can still maintain a relatively close distance, thereby preserving the structural information between the nodes in the graph. However, this is sub-optimal due to: (i) traditional methods have limited model capacity which limits the learning performance; (ii) existing techniques typically rely on unsupervised learning strategies and fail to couple with the latest learning paradigms; (iii) representation learning and downstream tasks are dependent on each other which should be jointly enhanced. With the remarkable success of deep learning, deep graph representation learning has shown great potential and advantages over shallow (traditional) methods, there exist a large number of deep graph representation learning techniques have been proposed in the past decade, especially graph neural networks. In this survey, we conduct a comprehensive survey on current deep graph representation learning algorithms by proposing a new taxonomy of existing state-of-the-art literature. Specifically, we systematically summarize the essential components of graph representation learning and categorize existing approaches by the ways of graph neural network architectures and the most recent advanced learning paradigms. Moreover, this survey also provides the practical and promising applications of deep graph representation learning. Last but not least, we state new perspectives and suggest challenging directions which deserve further investigations in the future.
Collapse
Affiliation(s)
- Wei Ju
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zheng Fang
- School of Intelligence Science and Technology, Peking University, Beijing, 100871, China
| | - Yiyang Gu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zequn Liu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100086, China
| | - Ziyue Qiao
- Artificial Intelligence Thrust, The Hong Kong University of Science and Technology, Guangzhou, 511453, China
| | - Yifang Qin
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jianhao Shen
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Fang Sun
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Zhiping Xiao
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Junwei Yang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jingyang Yuan
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yusheng Zhao
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yifan Wang
- School of Information Technology & Management, University of International Business and Economics, Beijing, 100029, China
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, 90095, USA.
| | - Ming Zhang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China.
| |
Collapse
|
23
|
Zheng EJ, Valeri JA, Andrews IW, Krishnan A, Bandyopadhyay P, Anahtar MN, Herneisen A, Schulte F, Linnehan B, Wong F, Stokes JM, Renner LD, Lourido S, Collins JJ. Discovery of antibiotics that selectively kill metabolically dormant bacteria. Cell Chem Biol 2024; 31:712-728.e9. [PMID: 38029756 PMCID: PMC11031330 DOI: 10.1016/j.chembiol.2023.10.026] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 08/13/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023]
Abstract
There is a need to discover and develop non-toxic antibiotics that are effective against metabolically dormant bacteria, which underlie chronic infections and promote antibiotic resistance. Traditional antibiotic discovery has historically favored compounds effective against actively metabolizing cells, a property that is not predictive of efficacy in metabolically inactive contexts. Here, we combine a stationary-phase screening method with deep learning-powered virtual screens and toxicity filtering to discover compounds with lethality against metabolically dormant bacteria and favorable toxicity profiles. The most potent and structurally distinct compound without any obvious mechanistic liability was semapimod, an anti-inflammatory drug effective against stationary-phase E. coli and A. baumannii. Integrating microbiological assays, biochemical measurements, and single-cell microscopy, we show that semapimod selectively disrupts and permeabilizes the bacterial outer membrane by binding lipopolysaccharide. This work illustrates the value of harnessing non-traditional screening methods and deep learning models to identify non-toxic antibacterial compounds that are effective in infection-relevant contexts.
Collapse
Affiliation(s)
- Erica J Zheng
- Program in Chemical Biology, Harvard University, Cambridge, MA 02138, USA; Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Jacqueline A Valeri
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Institute for Medical Engineering & Science, Department of Biological Engineering, and Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Ian W Andrews
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Institute for Medical Engineering & Science, Department of Biological Engineering, and Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Aarti Krishnan
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Institute for Medical Engineering & Science, Department of Biological Engineering, and Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
| | - Parijat Bandyopadhyay
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Institute for Medical Engineering & Science, Department of Biological Engineering, and Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Melis N Anahtar
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Institute for Medical Engineering & Science, Department of Biological Engineering, and Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Alice Herneisen
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA; Department of Biology, MIT, Cambridge, MA 02139, USA
| | - Fabian Schulte
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
| | - Brooke Linnehan
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
| | - Felix Wong
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Institute for Medical Engineering & Science, Department of Biological Engineering, and Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jonathan M Stokes
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario L8N 3Z5, Canada
| | - Lars D Renner
- Leibniz Institute of Polymer Research and the Max Bergmann Center of Biomaterials, 01062 Dresden, Germany
| | - Sebastian Lourido
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA; Department of Biology, MIT, Cambridge, MA 02139, USA
| | - James J Collins
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Institute for Medical Engineering & Science, Department of Biological Engineering, and Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
24
|
Zhang Z, Bian Y, Xie A, Han P, Zhou S. Can Pretrained Models Really Learn Better Molecular Representations for AI-Aided Drug Discovery? J Chem Inf Model 2024; 64:2921-2930. [PMID: 38145387 PMCID: PMC11005046 DOI: 10.1021/acs.jcim.3c01707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/29/2023] [Accepted: 11/29/2023] [Indexed: 12/26/2023]
Abstract
Self-supervised pretrained models are gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pretrained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations has not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative Structure-Activity Relationship analysis, we propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of the representations extracted by the pretrained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure-activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore, the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pretrained models are analyzed. The results indicate that the state-of-the-art pretrained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints, while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pretrained models. And our findings can guide the community to develop better pretraining techniques to regularize the occurrence of ACs and SH.
Collapse
Affiliation(s)
- Ziqiao Zhang
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| | | | - Ailin Xie
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| | - Pengju Han
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| | - Shuigeng Zhou
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| |
Collapse
|
25
|
Mehta MJ, Kim HJ, Lim SB, Naito M, Miyata K. Recent Progress in the Endosomal Escape Mechanism and Chemical Structures of Polycations for Nucleic Acid Delivery. Macromol Biosci 2024; 24:e2300366. [PMID: 38226723 DOI: 10.1002/mabi.202300366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 12/22/2023] [Indexed: 01/17/2024]
Abstract
Nucleic acid-based therapies are seeing a spiralling surge. Stimuli-responsive polymers, especially pH-responsive ones, are gaining widespread attention because of their ability to efficiently deliver nucleic acids. These polymers can be synthesized and modified according to target requirements, such as delivery sites and the nature of nucleic acids. In this regard, the endosomal escape mechanism of polymer-nucleic acid complexes (polyplexes) remains a topic of considerable interest owing to various plausible escape mechanisms. This review describes current progress in the endosomal escape mechanism of polyplexes and state-of-the-art chemical designs for pH-responsive polymers. The importance is also discussed of the acid dissociation constant (i.e., pKa) in designing the new generation of pH-responsive polymers, along with assays to monitor and quantify the endosomal escape behavior. Further, the use of machine learning is addressed in pKa prediction and polymer design to find novel chemical structures for pH responsiveness. This review will facilitate the design of new pH-responsive polymers for advanced and efficient nucleic acid delivery.
Collapse
Affiliation(s)
- Mohit J Mehta
- Department of Biological Sciences and Bioengineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
| | - Hyun Jin Kim
- Department of Biological Sciences and Bioengineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
- Department of Biological Engineering, College of Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
| | - Sung Been Lim
- Department of Biological Sciences and Bioengineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
| | - Mitsuru Naito
- Department of Materials Engineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kanjiro Miyata
- Department of Materials Engineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| |
Collapse
|
26
|
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A. Will we ever be able to accurately predict solubility? Sci Data 2024; 11:303. [PMID: 38499581 PMCID: PMC10948805 DOI: 10.1038/s41597-024-03105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 02/29/2024] [Indexed: 03/20/2024] Open
Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
Collapse
Affiliation(s)
- P Llompart
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
- IDD/CADD, Sanofi, Vitry-Sur-Seine, France
| | | | - S Baybekov
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - G Marcou
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.
| | - A Varnek
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
27
|
Jung SG, Jung G, Cole JM. Automatic Prediction of Peak Optical Absorption Wavelengths in Molecules Using Convolutional Neural Networks. J Chem Inf Model 2024; 64:1486-1501. [PMID: 38422386 PMCID: PMC10934802 DOI: 10.1021/acs.jcim.3c01792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/02/2024]
Abstract
Molecular design depends heavily on optical properties for applications such as solar cells and polymer-based batteries. Accurate prediction of these properties is essential, and multiple predictive methods exist, from ab initio to data-driven techniques. Although theoretical methods, such as time-dependent density functional theory (TD-DFT) calculations, have well-established physical relevance and are among the most popular methods in computational physics and chemistry, they exhibit errors that are inherent in their approximate nature. These high-throughput electronic structure calculations also incur a substantial computational cost. With the emergence of big-data initiatives, cost-effective, data-driven methods have gained traction, although their usability is highly contingent on the degree of data quality and sparsity. In this study, we present a workflow that employs deep residual convolutional neural networks (DR-CNN) and gradient boosting feature selection to predict peak optical absorption wavelengths (λmax) exclusively from SMILES representations of dye molecules and solvents; one would normally measure λmax using UV-vis absorption spectroscopy. We use a multifidelity modeling approach, integrating 34,893 DFT calculations and 26,395 experimentally derived λmax data, to deliver more accurate predictions via a Bayesian-optimized gradient boosting machine. Our approach is benchmarked against the state of the art that is reported in the scientific literature; results demonstrate that learnt representations via a DR-CNN workflow that is integrated with other machine learning methods can accelerate the design of molecules for specific optical characteristics.
Collapse
Affiliation(s)
- Son Gyo Jung
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
- Research
Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, U.K.
| | - Guwon Jung
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.
- Research
Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, U.K.
- Scientific
Computing Department, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
| | - Jacqueline M. Cole
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
- Research
Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, U.K.
| |
Collapse
|
28
|
Shi W, Lin K, Zhao Y, Li Z, Zhou T. Toward a comprehensive understanding of alicyclic compounds: Bio-effects perspective and deep learning approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 912:168927. [PMID: 38042202 DOI: 10.1016/j.scitotenv.2023.168927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/17/2023] [Accepted: 11/25/2023] [Indexed: 12/04/2023]
Abstract
The escalating use of alicyclic compounds in modern industrial production has led to a rapid increase of these substances in the environment, posing significant health hazards. Addressing this challenge necessitates a comprehensive understanding of these compounds, which can be achieved through the deep learning approach. Graph neural networks (GNN) known for its' extraordinary ability to process graph data with rich relationships, have been employed in various molecular prediction tasks. In this study, alicyclic molecules screened from PCBA, Toxcast and Tox21 are made as general bioactivity and biological targets' activity prediction datasets. GNN-based models are trained on the two datasets, while the Attentive FP and PAGTN achieve best performance individually. In addition, alicyclic carbon atoms make the greatest contribution to biological activity, which indicate that the alicycle structures have significant impact on the carbon atoms' contribution. Moreover, there are terrific number of active molecules in other public datasets, indicates that alicyclic compounds deserve more attention in POPs control. This study uncovered deeper structural-activity relationships within these compounds, offering new perspectives and methodologies for academic research in the field.
Collapse
Affiliation(s)
- Wenjie Shi
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China.
| | - Kunsen Lin
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China.
| | - Youcai Zhao
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, 1515 North Zhongshan Rd. (No. 2), Shanghai 200092, PR China
| | - Zongsheng Li
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China
| | - Tao Zhou
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, 1515 North Zhongshan Rd. (No. 2), Shanghai 200092, PR China.
| |
Collapse
|
29
|
Zhang R, Mahjour B, Outlaw A, McGrath A, Hopper T, Kelley B, Walters WP, Cernak T. Exploring the combinatorial explosion of amine-acid reaction space via graph editing. Commun Chem 2024; 7:22. [PMID: 38310120 PMCID: PMC10838272 DOI: 10.1038/s42004-024-01101-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 01/08/2024] [Indexed: 02/05/2024] Open
Abstract
Amines and carboxylic acids are abundant chemical feedstocks that are nearly exclusively united via the amide coupling reaction. The disproportionate use of the amide coupling leaves a large section of unexplored reaction space between amines and acids: two of the most common chemical building blocks. Herein we conduct a thorough exploration of amine-acid reaction space via systematic enumeration of reactions involving a simple amine-carboxylic acid pair. This approach to chemical space exploration investigates the coarse and fine modulation of physicochemical properties and molecular shapes. With the invention of reaction methods becoming increasingly automated and bringing conceptual reactions into reality, our map provides an entirely new axis of chemical space exploration for rational property design.
Collapse
Affiliation(s)
- Rui Zhang
- Department of Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Babak Mahjour
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Andrew Outlaw
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Andrew McGrath
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | | | | | | | - Tim Cernak
- Department of Chemistry, University of Michigan, Ann Arbor, MI, USA.
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
30
|
Hadiby S, Ben Ali YM. Integrating pharmacophore model and deep learning for activity prediction of molecules with BRCA1 gene. J Bioinform Comput Biol 2024; 22:2450003. [PMID: 38567386 DOI: 10.1142/s0219720024500033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
In this paper, we propose a novel approach for predicting the activity/inactivity of molecules with the BRCA1 gene by combining pharmacophore modeling and deep learning techniques. Initially, we generated 3D pharmacophore fingerprints using a pharmacophore model, which captures the essential features and spatial arrangements critical for biological activity. These fingerprints served as informative representations of the molecular structures. Next, we employed deep learning algorithms to train a predictive model using the generated pharmacophore fingerprints. The deep learning model was designed to learn complex patterns and relationships between the pharmacophore features and the corresponding activity/inactivity labels of the molecules. By utilizing this integrated approach, we aimed to enhance the accuracy and efficiency of activity prediction. To validate the effectiveness of our approach, we conducted experiments using a dataset of known molecules with BRCA1 gene activity/inactivity from diverse sources. Our results demonstrated promising predictive performance, indicating the successful integration of pharmacophore modeling and deep learning. Furthermore, we utilized the trained model to predict the activity/inactivity of unknown molecules extracted from the ChEMBL database. The predictions obtained from the ChEMBL database were assessed and compared against experimentally determined values to evaluate the reliability and generalizability of our model. Overall, our proposed approach showcased significant potential in accurately predicting the activity/inactivity of molecules with the BRCA1 gene, thus enabling the identification of potential candidates for further investigation in drug discovery and development processes.
Collapse
Affiliation(s)
- Seloua Hadiby
- Department of Computer Science, Computer Research Laboratory, Badji Mokhtar University, Annaba, Algeria
| | - Yamina Mohamed Ben Ali
- Department of Computer Science, Computer Research Laboratory, Badji Mokhtar University, Annaba, Algeria
| |
Collapse
|
31
|
Song Z, Chen J, Cheng J, Chen G, Qi Z. Computer-Aided Molecular Design of Ionic Liquids as Advanced Process Media: A Review from Fundamentals to Applications. Chem Rev 2024; 124:248-317. [PMID: 38108629 DOI: 10.1021/acs.chemrev.3c00223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The unique physicochemical properties, flexible structural tunability, and giant chemical space of ionic liquids (ILs) provide them a great opportunity to match different target properties to work as advanced process media. The crux of the matter is how to efficiently and reliably tailor suitable ILs toward a specific application. In this regard, the computer-aided molecular design (CAMD) approach has been widely adapted to cover this family of high-profile chemicals, that is, to perform computer-aided IL design (CAILD). This review discusses the past developments that have contributed to the state-of-the-art of CAILD and provides a perspective about how future works could pursue the acceleration of the practical application of ILs. In a broad context of CAILD, key aspects related to the forward structure-property modeling and reverse molecular design of ILs are overviewed. For the former forward task, diverse IL molecular representations, modeling algorithms, as well as representative models on physical properties, thermodynamic properties, among others of ILs are introduced. For the latter reverse task, representative works formulating different molecular design scenarios are summarized. Beyond the substantial progress made, some future perspectives to move CAILD a step forward are finally provided.
Collapse
Affiliation(s)
- Zhen Song
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Jiahui Chen
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Jie Cheng
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guzhong Chen
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zhiwen Qi
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
32
|
Zhu J, Che C, Jiang H, Xu J, Yin J, Zhong Z. SSF-DDI: a deep learning method utilizing drug sequence and substructure features for drug-drug interaction prediction. BMC Bioinformatics 2024; 25:39. [PMID: 38262923 PMCID: PMC10810255 DOI: 10.1186/s12859-024-05654-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 01/12/2024] [Indexed: 01/25/2024] Open
Abstract
BACKGROUND Drug-drug interactions (DDI) are prevalent in combination therapy, necessitating the importance of identifying and predicting potential DDI. While various artificial intelligence methods can predict and identify potential DDI, they often overlook the sequence information of drug molecules and fail to comprehensively consider the contribution of molecular substructures to DDI. RESULTS In this paper, we proposed a novel model for DDI prediction based on sequence and substructure features (SSF-DDI) to address these issues. Our model integrates drug sequence features and structural features from the drug molecule graph, providing enhanced information for DDI prediction and enabling a more comprehensive and accurate representation of drug molecules. CONCLUSION The results of experiments and case studies have demonstrated that SSF-DDI significantly outperforms state-of-the-art DDI prediction models across multiple real datasets and settings. SSF-DDI performs better in predicting DDI involving unknown drugs, resulting in a 5.67% improvement in accuracy compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Jing Zhu
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian, 116000, China
| | - Chao Che
- School of Software Engineering, Dalian University, Dalian, 116000, China
| | - Hao Jiang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian, 116000, China
| | - Jian Xu
- General Surgery, Affiliated Zhongshan Hospital of Dalian University, Dalian, 116000, China
| | - Jiajun Yin
- General Surgery, Affiliated Zhongshan Hospital of Dalian University, Dalian, 116000, China
| | - Zhaoqian Zhong
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian, 116000, China.
| |
Collapse
|
33
|
Maziarka Ł, Majchrowski D, Danel T, Gaiński P, Tabor J, Podolak I, Morkisz P, Jastrzębski S. Relative molecule self-attention transformer. J Cheminform 2024; 16:3. [PMID: 38173009 PMCID: PMC10765783 DOI: 10.1186/s13321-023-00789-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 11/28/2023] [Indexed: 01/05/2024] Open
Abstract
The prediction of molecular properties is a crucial aspect in drug discovery that can save a lot of money and time during the drug design process. The use of machine learning methods to predict molecular properties has become increasingly popular in recent years. Despite advancements in the field, several challenges remain that need to be addressed, like finding an optimal pre-training procedure to improve performance on small datasets, which are common in drug discovery. In our paper, we tackle these problems by introducing Relative Molecule Self-Attention Transformer for molecular representation learning. It is a novel architecture that uses relative self-attention and 3D molecular representation to capture the interactions between atoms and bonds that enrich the backbone model with domain-specific inductive biases. Furthermore, our two-step pretraining procedure allows us to tune only a few hyperparameter values to achieve good performance comparable with state-of-the-art models on a wide selection of downstream tasks.
Collapse
Affiliation(s)
- Łukasz Maziarka
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Cracow, Poland.
| | | | - Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Cracow, Poland
| | - Piotr Gaiński
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Cracow, Poland
- Ardigen, Podole 76, 30-394, Cracow, Poland
| | - Jacek Tabor
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Cracow, Poland
| | - Igor Podolak
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Cracow, Poland
| | - Paweł Morkisz
- NVIDIA, 2788 San Tomas Expy, Santa Clara, CA, 95051, USA
| | | |
Collapse
|
34
|
Ghahremanpour MM, Saar A, Tirado-Rives J, Jorgensen WL. Ensemble Geometric Deep Learning of Aqueous Solubility. J Chem Inf Model 2023; 63:7338-7349. [PMID: 37990484 DOI: 10.1021/acs.jcim.3c01536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Geometric deep learning is one of the main workhorses for harnessing the power of big data to predict molecular properties such as aqueous solubility, which is key to the pharmacokinetic improvement of drug candidates. Two ensembles of graph neural network architectures were built, one based on spectral convolution and the other on spatial convolution. The pretrained models, denoted respectively as SolNet-GCN and SolNet-GAT, significantly outperformed the existing neural networks benchmarked on a validation set of 207 molecules. The SolNet-GCN model demonstrated the best performance on both the training and validation sets, with RMSE values of 0.53 and 0.72 log molar unit and Pearson r2 values of 0.95 and 0.75, respectively. Further, the ranking power of the SolNet models agreed well with a QM-based thermodynamic cycle approach at the PBE-vdW level of theory on a series of benzophenylurea derivatives and a series of benzodiazepine derivatives. Nevertheless, testing the resultant models on a set of inhibitors of the macrophage migration inhibitory factor (MIF) illustrated that the inclusion of atomic attributes to discriminate atoms with a higher tendency to form intermolecular hydrogen bonds in the crystalline state and to identify planar or nonplanar substructures can be beneficial for the prediction of aqueous solubility.
Collapse
Affiliation(s)
| | - Anastasia Saar
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| | - Julian Tirado-Rives
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| | - William L Jorgensen
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| |
Collapse
|
35
|
McGibbon M, Shave S, Dong J, Gao Y, Houston DR, Xie J, Yang Y, Schwaller P, Blay V. From intuition to AI: evolution of small molecule representations in drug discovery. Brief Bioinform 2023; 25:bbad422. [PMID: 38033290 PMCID: PMC10689004 DOI: 10.1093/bib/bbad422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/13/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Steven Shave
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China
| | - Yumiao Gao
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jiancong Xie
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Vincent Blay
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| |
Collapse
|
36
|
Wang Z, Feng Z, Li Y, Li B, Wang Y, Sha C, He M, Li X. BatmanNet: bi-branch masked graph transformer autoencoder for molecular representation. Brief Bioinform 2023; 25:bbad400. [PMID: 38033291 PMCID: PMC10783874 DOI: 10.1093/bib/bbad400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/02/2023] [Accepted: 10/17/2023] [Indexed: 12/02/2023] Open
Abstract
Although substantial efforts have been made using graph neural networks (GNNs) for artificial intelligence (AI)-driven drug discovery, effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets , which are time-consuming, computationally expensive and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.
Collapse
Affiliation(s)
- Zhen Wang
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082, Hunan, China
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
| | - Zheng Feng
- Department of Health Outcomes & Biomedical Informatics, College of Medecine, University of Florida, Gainesville, 32611, FL, USA
| | - Yanjun Li
- Department of Medicinal Chemistry, College of Pharmacy, University of Florida, Gainesville, 32610, FL, USA
- Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, 32610, FL, USA
| | - Bowen Li
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
| | - Yongrui Wang
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
| | - Chulin Sha
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
| | - Min He
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082, Hunan, China
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
| | - Xiaolin Li
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
- ElasticMind Inc, Hangzhou, 310018, Zhejiang, China
| |
Collapse
|
37
|
Shen C, Luo J, Xia K. Molecular geometric deep learning. CELL REPORTS METHODS 2023; 3:100621. [PMID: 37875121 PMCID: PMC10694498 DOI: 10.1016/j.crmeth.2023.100621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 06/16/2023] [Accepted: 09/28/2023] [Indexed: 10/26/2023]
Abstract
Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard of covalent-bond-based molecular graphs for representing molecular topology at the atomic level and totally ignore the non-covalent interactions within the molecule. In this study, we propose a molecular geometric deep learning model to predict the properties of molecules that aims to comprehensively consider the information of covalent and non-covalent interactions of molecules. The essential idea is to incorporate a more general molecular representation into geometric deep learning (GDL) models. We systematically test molecular GDL (Mol-GDL) on fourteen commonly used benchmark datasets. The results show that Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Extensive tests have demonstrated the important role of non-covalent interactions in molecular property prediction and the effectiveness of Mol-GDL models.
Collapse
Affiliation(s)
- Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China; School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China.
| | - Kelin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore.
| |
Collapse
|
38
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
39
|
Wilson AN, St John PC, Marin DH, Hoyt CB, Rognerud EG, Nimlos MR, Cywar RM, Rorrer NA, Shebek KM, Broadbelt LJ, Beckham GT, Crowley MF. PolyID: Artificial Intelligence for Discovering Performance-Advantaged and Sustainable Polymers. Macromolecules 2023; 56:8547-8557. [PMID: 38024155 PMCID: PMC10653284 DOI: 10.1021/acs.macromol.3c00994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 09/30/2023] [Indexed: 12/01/2023]
Abstract
A necessary transformation for a sustainable economy is the transition from fossil-derived plastics to polymers derived from biomass and waste resources. While renewable feedstocks can enhance material performance through unique chemical moieties, probing the vast material design space by experiment alone is not practically feasible. Here, we develop a machine-learning-based tool, PolyID, to reduce the design space of renewable feedstocks to enable efficient discovery of performance-advantaged, biobased polymers. PolyID is a multioutput, graph neural network specifically designed to increase accuracy and to enable quantitative structure-property relationship (QSPR) analysis for polymers. It includes a novel domain-of-validity method that was developed and applied to demonstrate how gaps in training data can be filled to improve accuracy. The model was benchmarked with both a 20% held-out subset of the original training data and 22 experimentally synthesized polymers. A mean absolute error for the glass transition temperatures of 19.8 and 26.4 °C was achieved for the test and experimental data sets, respectively. Predictions were made on polymers composed of monomers from four databases that contain biologically accessible small molecules: MetaCyc, MINEs, KEGG, and BiGG. From 1.4 × 106 accessible biobased polymers, we identified five poly(ethylene terephthalate) (PET) analogues with predicted improvements to thermal and transport performance. Experimental validation for one of the PET analogues demonstrated a glass transition temperature between 85 and 112 °C, which is higher than PET and within the predicted range of the PolyID tool. In addition to accurate predictions, we show how the model's predictions are explainable through analysis of individual bond importance for a biobased nylon. Overall, PolyID can aid the biobased polymer practitioner to navigate the vast number of renewable polymers to discover sustainable materials with enhanced performance.
Collapse
Affiliation(s)
- A. Nolan Wilson
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Peter C. St John
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Daniela H. Marin
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Caroline B. Hoyt
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Erik G. Rognerud
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Mark R. Nimlos
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Robin M. Cywar
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Nicholas A. Rorrer
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Kevin M. Shebek
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
- Department
of Chemical and Biological Engineering and Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, United States
- Chemistry
of Life Processes Institute, Northwestern
University, Evanston, Illinois 60208, United States
| | - Linda J. Broadbelt
- Department
of Chemical and Biological Engineering and Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, United States
| | - Gregg T. Beckham
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Michael F. Crowley
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| |
Collapse
|
40
|
Rebello NJ, Lin TS, Nazeer H, Olsen BD. BigSMARTS: A Topologically Aware Query Language and Substructure Search Algorithm for Polymer Chemical Structures. J Chem Inf Model 2023; 63:6555-6568. [PMID: 37874026 DOI: 10.1021/acs.jcim.3c00978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Molecular search is important in chemistry, biology, and informatics for identifying molecular structures within large data sets, improving knowledge discovery and innovation, and making chemical data FAIR (findable, accessible, interoperable, reusable). Search algorithms for polymers are significantly less developed than those for small molecules because polymer search relies on searching by polymer name, which can be challenging because polymer naming is overly broad (i.e., polyethylene), complicated for complex chemical structures, and often does not correspond to official IUPAC conventions. Chemical structure search in polymers is limited to substructures, such as monomers, without awareness of connectivity or topology. This work introduces a novel query language and graph traversal search algorithm for polymers that provides the first search method able to fully capture all of the chemical structures present in polymers. The BigSMARTS query language, an extension of the small-molecule SMARTS language, allows users to write queries that localize monomer and functional group searches to different parts of the polymer, like the middle block of a triblock, the side chain of a graft, and the backbone of a repeat unit. The substructure search algorithm is based on the traversal of graph representations of the generating functions for the stochastic graphs of polymers. Operationally, the algorithm first identifies cycles representing the monomers and then the end groups and finally performs a depth-first search to match entire subgraphs. To validate the algorithm, hundreds of queries were searched against hundreds of target chemistries and topologies from the literature, with approximately 440,000 query-target pairs. This tool provides a detailed algorithm that can be implemented in search engines to provide search results with full matching of the monomer connectivity and polymer topology.
Collapse
Affiliation(s)
- Nathan J Rebello
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Tzyy-Shyang Lin
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Heeba Nazeer
- Department of Computer Science, Wellesley College, 106 Central Street, Wellesley, Massachusetts 02481, United States
| | - Bradley D Olsen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
41
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
42
|
Shilpa S, Kashyap G, Sunoj RB. Recent Applications of Machine Learning in Molecular Property and Chemical Reaction Outcome Predictions. J Phys Chem A 2023; 127:8253-8271. [PMID: 37769193 DOI: 10.1021/acs.jpca.3c04779] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/30/2023]
Abstract
Burgeoning developments in machine learning (ML) and its rapidly growing adaptations in chemistry are noteworthy. Motivated by the successful deployments of ML in the realm of molecular property prediction (MPP) and chemical reaction prediction (CRP), herein we highlight some of its most recent applications in predictive chemistry. We present a nonmathematical and concise overview of the progression of ML implementations, ranging from an ensemble-based random forest model to advanced graph neural network algorithms. Similarly, the prospects of various feature engineering and feature learning approaches that work in conjunction with ML models are described. Highly accurate predictions reported in MPP tasks (e.g., lipophilicity, solubility, distribution coefficient), using methods such as D-MPNN, MolCLR, SMILES-BERT, and MolBERT, offer promising avenues in molecular design and drug discovery. Whereas MPP pertains to a given molecule, ML applications in chemical reactions present a different level of challenge, primarily arising from the simultaneous involvement of multiple molecules and their diverse roles in a reaction setting. The reported RMSEs in MPP tasks range from 0.287 to 2.20, while those for yield predictions are well over 4.9 in the lower end, reaching thresholds of >10.0 in several examples. Our Review concludes with a set of persisting challenges in dealing with reaction data sets and an overall optimistic outlook on benefits of ML-driven workflows for various MPP as well as CRP tasks.
Collapse
Affiliation(s)
- Shilpa Shilpa
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Gargee Kashyap
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| |
Collapse
|
43
|
Wojtuch A, Danel T, Podlewska S, Maziarka Ł. Extended study on atomic featurization in graph neural networks for molecular property prediction. J Cheminform 2023; 15:81. [PMID: 37726841 PMCID: PMC10507875 DOI: 10.1186/s13321-023-00751-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/23/2023] [Indexed: 09/21/2023] Open
Abstract
Graph neural networks have recently become a standard method for analyzing chemical compounds. In the field of molecular property prediction, the emphasis is now on designing new model architectures, and the importance of atom featurization is oftentimes belittled. When contrasting two graph neural networks, the use of different representations possibly leads to incorrect attribution of the results solely to the network architecture. To better understand this issue, we compare multiple atom representations by evaluating them on the prediction of free energy, solubility, and metabolic stability using graph convolutional networks. We discover that the choice of atom representation has a significant impact on model performance and that the optimal subset of features is task-specific. Additional experiments involving more sophisticated architectures, including graph transformers, support these findings. Moreover, we demonstrate that some commonly used atom features, such as the number of neighbors or the number of hydrogens, can be easily predicted using only information about bonds and atom type, yet their explicit inclusion in the representation has a positive impact on model performance. Finally, we explain the predictions of the best-performing models to better understand how they utilize the available atomic features.
Collapse
Affiliation(s)
- Agnieszka Wojtuch
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland.
| | - Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Smętna 12, 31-343, Kraków, Poland
| | - Łukasz Maziarka
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| |
Collapse
|
44
|
Alnammi M, Liu S, Ericksen SS, Ananiev GE, Voter AF, Guo S, Keck JL, Hoffmann FM, Wildman SA, Gitter A. Evaluating Scalable Supervised Learning for Synthesize-on-Demand Chemical Libraries. J Chem Inf Model 2023; 63:5513-5528. [PMID: 37625010 PMCID: PMC10538940 DOI: 10.1021/acs.jcim.3c00912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Indexed: 08/27/2023]
Abstract
Traditional small-molecule drug discovery is a time-consuming and costly endeavor. High-throughput chemical screening can only assess a tiny fraction of drug-like chemical space. The strong predictive power of modern machine-learning methods for virtual chemical screening enables training models on known active and inactive compounds and extrapolating to much larger chemical libraries. However, there has been limited experimental validation of these methods in practical applications on large commercially available or synthesize-on-demand chemical libraries. Through a prospective evaluation with the bacterial protein-protein interaction PriA-SSB, we demonstrate that ligand-based virtual screening can identify many active compounds in large commercial libraries. We use cross-validation to compare different types of supervised learning models and select a random forest (RF) classifier as the best model for this target. When predicting the activity of more than 8 million compounds from Aldrich Market Select, the RF substantially outperforms a naïve baseline based on chemical structure similarity. 48% of the RF's 701 selected compounds are active. The RF model easily scales to score one billion compounds from the synthesize-on-demand Enamine REAL database. We tested 68 chemically diverse top predictions from Enamine REAL and observed 31 hits (46%), including one with an IC50 value of 1.3 μM.
Collapse
Affiliation(s)
- Moayad Alnammi
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
- Department
of Information and Computer Science, King
Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia
| | - Shengchao Liu
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
| | - Spencer S. Ericksen
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Gene E. Ananiev
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Andrew F. Voter
- Department
of Biomolecular Chemistry, University of
Wisconsin−Madison, Madison, Wisconsin 53706, United States
| | - Song Guo
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - James L. Keck
- Department
of Biomolecular Chemistry, University of
Wisconsin−Madison, Madison, Wisconsin 53706, United States
| | - F. Michael Hoffmann
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
- McArdle Laboratory
for Cancer Research, University of Wisconsin−Madison, Madison, Wisconsin 53705, United States
| | - Scott A. Wildman
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Anthony Gitter
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
- Department
of Biostatistics and Medical Informatics, University of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| |
Collapse
|
45
|
Che L, Jin Y, Shi Y, Yu X, Sun H, Liu H, Li X. A drug molecular classification model based on graph structure generation. J Biomed Inform 2023; 145:104447. [PMID: 37481052 DOI: 10.1016/j.jbi.2023.104447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 07/14/2023] [Accepted: 07/16/2023] [Indexed: 07/24/2023]
Abstract
Molecular property prediction based on artificial intelligence technology has significant prospects in speeding up drug discovery and reducing drug discovery costs. Among them, molecular property prediction based on graph neural networks (GNNs) has received extensive attention in recent years. However, the existing graph neural networks still face the following challenges in node representation learning. First, the number of nodes increases exponentially with the expansion of the perception field, which limits the exploration ability of the model in the depth direction. Secondly, the large number of nodes in the perception field brings noise, which is not conducive to the model's representation learning of the key structures. Therefore, a graph neural network model based on structure generation is proposed in this paper. The model adopts the depth-first strategy to generate the key structures of the graph, to solve the problem of insufficient exploration ability of the graph neural network in the depth direction. A tendentious node selection method is designed to gradually select nodes and edges to generate the key structures of the graph, to solve the noise problem caused by the excessive number of nodes. In addition, the model skillfully realizes forward propagation and iterative optimization of structure generation by using an attention mechanism and random bias. Experimental results on public data sets show that the proposed model achieves better classification results than the existing best models.
Collapse
Affiliation(s)
- Lixuan Che
- College of Culture and Creativity, Weifang Vocational College, Weifang, China.
| | - Yide Jin
- Department of Statistics, University of Minnesota, Minneapolis, MN, USA.
| | - Yuliang Shi
- School of Software, Shandong University, Jinan, China; Dareway Software Co., Ltd, Jinan, China.
| | - Xiaojing Yu
- Department of Dermatology, Qilu Hospital, Shandong University, Jinan, China.
| | - Hongfeng Sun
- School of Data and Computer Science, Shandong Women's University, Jinan, China.
| | - Hui Liu
- School of Data and Computer Science, Shandong Women's University, Jinan, China.
| | - Xinyu Li
- Department of Dermatology, Qilu Hospital, Shandong University, Jinan, China.
| |
Collapse
|
46
|
Lamanna G, Delre P, Marcou G, Saviano M, Varnek A, Horvath D, Mangiatordi GF. GENERA: A Combined Genetic/Deep-Learning Algorithm for Multiobjective Target-Oriented De Novo Design. J Chem Inf Model 2023; 63:5107-5119. [PMID: 37556857 PMCID: PMC10466378 DOI: 10.1021/acs.jcim.3c00963] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Indexed: 08/11/2023]
Abstract
This study introduces a new de novo design algorithm called GENERA that combines the capabilities of a deep-learning algorithm for automated drug-like analogue design, called DeLA-Drug, with a genetic algorithm for generating molecules with desired target-oriented properties. Specifically, GENERA was applied to the angiotensin-converting enzyme 2 (ACE2) target, which is implicated in many pathological conditions, including COVID-19. The ability of GENERA to de novo design promising candidates for a specific target was assessed using two docking programs, PLANTS and GLIDE. A fitness function based on the Pareto dominance resulting from computed PLANTS and GLIDE scores was applied to demonstrate the algorithm's ability to perform multiobjective optimizations effectively. GENERA can quickly generate focused libraries that produce better scores compared to a starting set of known ACE-2 binders. This study is the first to utilize a DL-based algorithm designed for analogue generation as a mutational operator within a GA framework, representing an innovative approach to target-oriented de novo design.
Collapse
Affiliation(s)
- Giuseppe Lamanna
- Chemistry
Department, University of Bari “Aldo
Moro”, Via E.
Orabona, 4, I-70125 Bari, Italy
- CNR
− Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | - Pietro Delre
- CNR
− Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | - Gilles Marcou
- Laboratoire
de Chémoinformatique UMR7140, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Michele Saviano
- CNR
− Institute of Crystallography, Via Vivaldi 43, 81100 Caserta, Italy
| | - Alexandre Varnek
- Laboratoire
de Chémoinformatique UMR7140, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Dragos Horvath
- Laboratoire
de Chémoinformatique UMR7140, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | | |
Collapse
|
47
|
Mohiuddin A, Mondal S. Advancement of Computational Design Drug Delivery System in COVID-19: Current Updates and Future Crosstalk- A Critical update. Infect Disord Drug Targets 2023; 23:IDDT-EPUB-133706. [PMID: 37584349 PMCID: PMC11348471 DOI: 10.2174/1871526523666230816151614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 06/22/2023] [Accepted: 07/17/2023] [Indexed: 08/17/2023]
Abstract
Positive strides have been achieved in developing vaccines to combat the coronavirus-2019 infection (COVID-19) pandemic. Still, the outline of variations, particularly the most current delta divergent, has posed significant health encounters for people. Therefore, developing strong treatment strategies, such as an anti-COVID-19 medicine plan, may help deal with the pandemic more effectively. During the COVID-19 pandemic, some drug design techniques were effectively used to develop and substantiate relevant critical medications. Extensive research, both experimental and computational, has been dedicated to comprehending and characterizing the devastating COVID-19 disease. The urgency of the situation has led to the publication of over 130,000 COVID-19-related research papers in peer-reviewed journals and preprint servers. A significant focus of these efforts has been the identification of novel drug candidates and the repurposing of existing drugs to combat the virus. Many projects have utilized computational or computer-aided approaches to facilitate their studies. In this overview, we will explore the key computational methods and their applications in the discovery of small-molecule therapeutics for COVID-19, as reported in the research literature. We believe that the true effectiveness of computational tools lies in their ability to provide actionable and experimentally testable hypotheses, which in turn facilitate the discovery of new drugs and combinations thereof. Additionally, we recognize that open science and the rapid sharing of research findings are vital in expediting the development of much-needed therapeutics for COVID-19.
Collapse
Affiliation(s)
- Abu Mohiuddin
- Department of Pharmaceutical Science, GITAM School of Pharmacy, GITAM (Deemed to be University), Visakhapatnam-530045, A.P., India
| | - Sumanta Mondal
- Department of Pharmaceutical Science, GITAM School of Pharmacy, GITAM (Deemed to be University), Visakhapatnam-530045, A.P., India
| |
Collapse
|
48
|
Srivathsa AV, Sadashivappa NM, Hegde AK, Radha S, Mahesh AR, Ammunje DN, Sen D, Theivendren P, Govindaraj S, Kunjiappan S, Pavadai P. A Review on Artificial Intelligence Approaches and Rational Approaches in Drug Discovery. Curr Pharm Des 2023; 29:1180-1192. [PMID: 37132148 DOI: 10.2174/1381612829666230428110542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/06/2023] [Accepted: 02/27/2023] [Indexed: 05/04/2023]
Abstract
Artificial intelligence (AI) speeds up the drug development process and reduces its time, as well as the cost which is of enormous importance in outbreaks such as COVID-19. It uses a set of machine learning algorithms that collects the available data from resources, categorises, processes and develops novel learning methodologies. Virtual screening is a successful application of AI, which is used in screening huge drug-like databases and filtering to a small number of compounds. The brain's thinking of AI is its neural networking which uses techniques such as Convoluted Neural Network (CNN), Recursive Neural Network (RNN) or Generative Adversial Neural Network (GANN). The application ranges from small molecule drug discovery to the development of vaccines. In the present review article, we discussed various techniques of drug design, structure and ligand-based, pharmacokinetics and toxicity prediction using AI. The rapid phase of discovery is the need of the hour and AI is a targeted approach to achieve this.
Collapse
Affiliation(s)
- Anjana Vidya Srivathsa
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Nandini Markuli Sadashivappa
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Apeksha Krishnamurthy Hegde
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Srimathi Radha
- Department of Pharmaceutical Chemistry, SRM College of Pharmacy, Faculty of Medicine and Health Sciences, SRM Institute of Science and Technology, Chengalpattu District, Kattankulathur, Tamil Nadu, 603203, India
| | - Agasa Ramu Mahesh
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Damodar Nayak Ammunje
- Department of Pharmacology, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Debanjan Sen
- Department of Pharmaceutical Chemistry, BCDA College of Pharmacy & Technology, Hridaypur, Kolkata, 700127, West Bengal, India
| | - Panneerselvam Theivendren
- Department of Pharmaceutical Chemistry, Swamy Vivekanandha College of Pharmacy, Elayampalayam, Tiruchengode, 637205, India
| | - Saravanan Govindaraj
- Department of Pharmaceutical Chemistry, MNR College of Pharmacy, Fasalwadi, Sangareddy, 502 001, India
| | - Selvaraj Kunjiappan
- Department of Biotechnology, Kalasalingam Academy of Research and Education, Krishnankoil, 626126, India
| | - Parasuraman Pavadai
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| |
Collapse
|
49
|
Hajiabolhassan H, Taheri Z, Hojatnia A, Yeganeh YT. FunQG: Molecular Representation Learning via Quotient Graphs. J Chem Inf Model 2023. [PMID: 37186874 DOI: 10.1021/acs.jcim.3c00445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
To accurately predict molecular properties, it is important to learn expressive molecular representations. Graph neural networks (GNNs) have made significant advances in this area, but they often face limitations like neighbors-explosion, under-reaching, oversmoothing, and oversquashing. Additionally, GNNs tend to have high computational costs due to their large number of parameters. These limitations emerge or increase when dealing with larger graphs or deeper GNN models. One potential solution is to simplify the molecular graph into a smaller, richer, and more informative one that is easier to train GNNs. Our proposed molecular graph coarsening framework called FunQG, uses Functional groups as building blocks to determine a molecule's properties, based on a graph-theoretic concept called Quotient Graph. We show through experiments that the resulting informative graphs are much smaller than the original molecular graphs and are thus more suitable for training GNNs. We apply FunQG to popular molecular property prediction benchmarks and compare the performance of popular baseline GNNs on the resulting data sets to that of state-of-the-art baselines on the original data sets. Our experiments demonstrate that FunQG yields notable results on various data sets while dramatically reducing the number of parameters and computational costs. By utilizing functional groups, we can achieve an interpretable framework that indicates their significant role in determining the properties of molecular quotient graphs. Consequently, FunQG is a straightforward, computationally efficient, and generalizable solution for addressing the molecular representation learning problem.
Collapse
Affiliation(s)
- Hossein Hajiabolhassan
- Department of Mathematics and Information Technology, Chair of Information Technology, Montanuniversität Leoben, Franz-Josef-Strasse 18, A-8700 Leoben, Austria
- Machine Learning and Graph Mining Lab, Department of Applied Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, 19839-69411 Tehran, Iran
| | - Zahra Taheri
- Machine Learning and Graph Mining Lab, Department of Applied Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, 19839-69411 Tehran, Iran
| | - Ali Hojatnia
- Machine Learning and Graph Mining Lab, Department of Applied Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, 19839-69411 Tehran, Iran
| | - Yavar Taheri Yeganeh
- Machine Learning and Graph Mining Lab, Department of Applied Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, 19839-69411 Tehran, Iran
| |
Collapse
|
50
|
Qian X, Dai X, Luo L, Lin M, Xu Y, Zhao Y, Huang D, Qiu H, Liang L, Liu H, Liu Y, Gu L, Lu T, Chen Y, Zhang Y. An Interpretable Multitask Framework BiLAT Enables Accurate Prediction of Cyclin-Dependent Protein Kinase Inhibitors. J Chem Inf Model 2023. [PMID: 37171216 DOI: 10.1021/acs.jcim.3c00473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The cyclin-dependent protein kinases (CDKs) are protein-serine/threonine kinases with crucial effects on the regulation of cell cycle and transcription. CDKs can be a hallmark of cancer since their excessive expression could lead to impaired cell proliferation. However, the selectivity profile of most developed CDK inhibitors is not enough, which have hindered the therapeutic use of CDK inhibitors. In this study, we propose a multitask deep learning framework called BiLAT based on SMILES representation for the prediction of the inhibitory activity of molecules on eight CDK subtypes (CDK1, 2, 4-9). The framework is mainly composed of an improved bidirectional long short-term memory module BiLSTM and the encode layer of the Transformer framework. Additionally, the data enhancement method of SMILES enumeration is applied to improve the performance of the model. Compared with baseline predictive models based on three conventional machine learning methods and two multitask deep learning algorithms, BiLAT achieves the best performance with the highest average AUC, ACC, F1-score, and MCC values of 0.938, 0.894, 0.911, and 0.715 for the test set. Moreover, we constructed a targeted external data set CDK-Dec for the CDK family, which mainly contains bait values screened by 3D similarity with active compounds. This dataset was utilized in the subsequent evaluation of our model. It is worth mentioning that the BiLAT model is interpretable and can be used by chemists to design and synthesize compounds with improved activity. To further verify the generalization ability of the multitask BiLAT model, we also conducted another evaluation on three public datasets (Tox21, ClinTox, and SIDER). Compared with several currently popular models, BiLAT shows the best performance on two datasets. These results indicate that BiLAT is an effective tool for accelerating drug discovery.
Collapse
Affiliation(s)
- Xu Qian
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xiaowen Dai
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lin Luo
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Mingde Lin
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yuan Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yang Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Dingfang Huang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haodi Qiu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Li Liang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yingbo Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lingxi Gu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing 210009, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|