1
|
Baruah O, Parasar U, Borphukan A, Phukan B, Bharali P, Nagamani S, Mahanta HJ. Integrating (deep) machine learning and cheminformatics for predicting human intestinal absorption of small molecules. Comput Biol Chem 2024; 113:108270. [PMID: 39481232 DOI: 10.1016/j.compbiolchem.2024.108270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 09/30/2024] [Accepted: 10/23/2024] [Indexed: 11/02/2024]
Abstract
The oral route is the most preferred route for drug delivery, due to which the largest share of the pharmaceutical market is represented by oral drugs. Human intestinal absorption (HIA) is closely related to oral bioavailability making it an important factor in predicting drug absorption. In this study, we focus on predicting drug permeability at HIA as a marker for oral bioavailability. A set of 2648 compounds were collected from some early as well as recent works and curated to build a robust dataset. Five machine learning (ML) algorithms have been trained with a set of molecular descriptors of these compounds which have been selected after rigorous feature engineering. Additionally, two deep learning models - graph convolution neural network (GCNN) and graph attention network (GAT) based model were developed using the same set of compounds to exploit the predictability with automated extracted features. The numerical analyses show that out the five ML models, Random forest and LightGBM could predict with an accuracy of 87.71 % and 86.04 % on the test set and 81.43 % and 77.30 % with the external validation set respectively. Whereas with the GCNN and GAT based models, the final accuracy achieved was 77.69 % and 78.58 % on test set and 79.29 % and 79.42 % on the external validation set respectively. We believe deployment of these models for screening oral drugs can provide promising results and therefore deposited the dataset and models on the GitHub platform (https://github.com/hridoy69/HIA).
Collapse
Affiliation(s)
- Orchid Baruah
- Department of Information Technology, The Assam Kaziranga University, Jorhat, Assam 785006, India
| | - Upashya Parasar
- Department of Information Technology, The Assam Kaziranga University, Jorhat, Assam 785006, India
| | - Anirban Borphukan
- Department of Information Technology, The Assam Kaziranga University, Jorhat, Assam 785006, India
| | - Bikram Phukan
- Advanced Computation and Data Sciences Division, CSIR North East Institute of Science and Technology, Jorhat, Assam 785006, India
| | - Pankaj Bharali
- Centre for Infectious Diseases, CSIR North East Institute of Science and Technology, Jorhat, Assam 785006, India; Academy of Scientific and Innovation Research (AcSIR), Gazhiabad, Uttar Pradesh 201002, India
| | - Selvaraman Nagamani
- Advanced Computation and Data Sciences Division, CSIR North East Institute of Science and Technology, Jorhat, Assam 785006, India; Academy of Scientific and Innovation Research (AcSIR), Gazhiabad, Uttar Pradesh 201002, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR North East Institute of Science and Technology, Jorhat, Assam 785006, India; Academy of Scientific and Innovation Research (AcSIR), Gazhiabad, Uttar Pradesh 201002, India.
| |
Collapse
|
2
|
Li X, Feng X, Zhou J, Luo Y, Chen X, Zhao J, Chen H, Xiong G, Luo G. A muti-modal feature fusion method based on deep learning for predicting immunotherapy response. J Theor Biol 2024; 586:111816. [PMID: 38589007 DOI: 10.1016/j.jtbi.2024.111816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 03/28/2024] [Accepted: 04/03/2024] [Indexed: 04/10/2024]
Abstract
Immune checkpoint therapy (ICT) has greatly improved the survival of cancer patients in the past few years, but only a small number of patients respond to ICT. To predict ICT response, we developed a multi-modal feature fusion model based on deep learning (MFMDL). This model utilizes graph neural networks to map gene-gene relationships in gene networks to low dimensional vector spaces, and then fuses biological pathway features and immune cell infiltration features to make robust predictions of ICT. We used five datasets to validate the predictive performance of the MFMDL. These five datasets span multiple types of cancer, including melanoma, lung cancer, and gastric cancer. We found that the prediction performance of multi-modal feature fusion model based on deep learning is superior to other traditional ICT biomarkers, such as ICT targets or tumor microenvironment-associated markers. In addition, we also conducted ablation experiments to demonstrate the necessity of fusing different modal features, which can improve the prediction accuracy of the model.
Collapse
Affiliation(s)
- Xiong Li
- School of Software, East China Jiaotong University, Nanchang 330013, China
| | - Xuan Feng
- School of Software, East China Jiaotong University, Nanchang 330013, China
| | - Juan Zhou
- School of Software, East China Jiaotong University, Nanchang 330013, China
| | - Yuchao Luo
- School of Software, East China Jiaotong University, Nanchang 330013, China
| | - Xiao Chen
- School of Software, East China Jiaotong University, Nanchang 330013, China
| | - Jiapeng Zhao
- School of Software, East China Jiaotong University, Nanchang 330013, China
| | - Haowen Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| | - Guoming Xiong
- School of Software, East China Jiaotong University, Nanchang 330013, China
| | - Guoliang Luo
- School of Software, East China Jiaotong University, Nanchang 330013, China
| |
Collapse
|
3
|
Wang D, Wang Y, Evans L, Tiwary P. From Latent Dynamics to Meaningful Representations. J Chem Theory Comput 2024; 20:3503-3513. [PMID: 38649368 DOI: 10.1021/acs.jctc.4c00249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
While representation learning has been central to the rise of machine learning and artificial intelligence, a key problem remains in making the learned representations meaningful. For this, the typical approach is to regularize the learned representation through prior probability distributions. However, such priors are usually unavailable or are ad hoc. To deal with this, recent efforts have shifted toward leveraging the insights from physical principles to guide the learning process. In this spirit, we propose a purely dynamics-constrained representation learning framework. Instead of relying on predefined probabilities, we restrict the latent representation to follow overdamped Langevin dynamics with a learnable transition density─a prior driven by statistical mechanics. We show that this is a more natural constraint for representation learning in stochastic dynamical systems, with the crucial ability to uniquely identify the ground truth representation. We validate our framework for different systems including a real-world fluorescent DNA movie data set. We show that our algorithm can uniquely identify orthogonal, isometric, and meaningful latent representations.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Yihang Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Luke Evans
- Department of Mathematics, University of Maryland, College Park, Maryland 20742, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| |
Collapse
|
4
|
Singh S, Zeh G, Freiherr J, Bauer T, Türkmen I, Grasskamp AT. Classification of substances by health hazard using deep neural networks and molecular electron densities. J Cheminform 2024; 16:45. [PMID: 38627862 PMCID: PMC11302296 DOI: 10.1186/s13321-024-00835-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 03/23/2024] [Indexed: 08/09/2024] Open
Abstract
In this paper we present a method that allows leveraging 3D electron density information to train a deep neural network pipeline to segment regions of high, medium and low electronegativity and classify substances as health hazardous or non-hazardous. We show that this can be used for use-cases such as cosmetics and food products. For this purpose, we first generate 3D electron density cubes using semiempirical molecular calculations for a custom European Chemicals Agency (ECHA) subset consisting of substances labelled as hazardous and non-hazardous for cosmetic usage. Together with their 3-class electronegativity maps we train a modified 3D-UNet with electron density cubes to segment reactive sites in molecules and classify substances with an accuracy of 78.1%. We perform the same process on a custom food dataset (CompFood) consisting of hazardous and non-hazardous substances compiled from European Food Safety Authority (EFSA) OpenFoodTox, Food and Drug Administration (FDA) Generally Recognized as Safe (GRAS) and FooDB datasets to achieve a classification accuracy of 64.1%. Our results show that 3D electron densities and particularly masked electron densities, calculated by taking a product of original electron densities and regions of high and low electronegativity can be used to classify molecules for different use-cases and thus serve not only to guide safe-by-design product development but also aid in regulatory decisions. SCIENTIFIC CONTRIBUTION: We aim to contribute to the diverse 3D molecular representations used for training machine learning algorithms by showing that a deep learning network can be trained on 3D electron density representation of molecules. This approach has previously not been used to train machine learning models and it allows utilization of the true spatial domain of the molecule for prediction of properties such as their suitability for usage in cosmetics and food products and in future, to other molecular properties. The data and code used for training is accessible at https://github.com/s-singh-ivv/eDen-Substances .
Collapse
Affiliation(s)
- Satnam Singh
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 6, 91054, Erlangen, Germany
| | - Gina Zeh
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
| | - Jessica Freiherr
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 6, 91054, Erlangen, Germany
| | - Thilo Bauer
- Computer Chemistry Center, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nägelsbachstr. 25, 91052, Erlangen, Germany
| | - Isik Türkmen
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
| | - Andreas T Grasskamp
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany.
| |
Collapse
|
5
|
Sidorov P, Tsuji N. A Primer on 2D Descriptors in Selectivity Modeling for Asymmetric Catalysis. Chemistry 2024; 30:e202302837. [PMID: 38010242 DOI: 10.1002/chem.202302837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/21/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023]
Abstract
Machine learning has permeated all fields of research, including chemistry, and is now an integral part of the design of novel compounds with desired properties. In the field of asymmetric catalysis, the preference still lies with models based on a physical understanding of the catalysis phenomenon and the electronic and steric properties of catalysts. However, such models require quantum chemical calculations and are thus limited by their computational cost. Here, we highlight the recent advances in modeling catalyst selectivity by using the 2D structures of catalysts and substrates. While these have a less explicit mechanistic connection to the modeled property, 2D descriptors, such as topological indices, molecular fingerprints, and fragments, offer the tremendous advantages of low cost and high speed of calculations. This makes them optimal for the in-silico screening of large amounts of data. We provide an overview of common quantitative structure-property relationship workflow, model building and validation techniques, applications of these methodologies in asymmetric catalysis design, and an outlook on improving the understanding of 2D-based models.
Collapse
Affiliation(s)
- Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| | - Nobuya Tsuji
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| |
Collapse
|
6
|
Velloso JPL, Kovacs AS, Pires DEV, Ascher DB. AI-driven GPCR analysis, engineering, and targeting. Curr Opin Pharmacol 2024; 74:102427. [PMID: 38219398 DOI: 10.1016/j.coph.2023.102427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/16/2024]
Abstract
This article investigates the role of recent advances in Artificial Intelligence (AI) to revolutionise the study of G protein-coupled receptors (GPCRs). AI has been applied to many areas of GPCR research, including the application of machine learning (ML) in GPCR classification, prediction of GPCR activation levels, modelling GPCR 3D structures and interactions, understanding G-protein selectivity, aiding elucidation of GPCRs structures, and drug design. Despite progress, challenges in predicting GPCR structures and addressing the complex nature of GPCRs remain, providing avenues for future research and development.
Collapse
Affiliation(s)
- João P L Velloso
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - Aaron S Kovacs
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia.
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
7
|
Keshavarzi Arshadi A, Salem M, Karner H, Garcia K, Arab A, Yuan JS, Goodarzi H. Functional microRNA-targeting drug discovery by graph-based deep learning. PATTERNS (NEW YORK, N.Y.) 2024; 5:100909. [PMID: 38264717 PMCID: PMC10801238 DOI: 10.1016/j.patter.2023.100909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 11/09/2023] [Accepted: 12/07/2023] [Indexed: 01/25/2024]
Abstract
MicroRNAs are recognized as key drivers in many cancers but targeting them with small molecules remains a challenge. We present RiboStrike, a deep-learning framework that identifies small molecules against specific microRNAs. To demonstrate its capabilities, we applied it to microRNA-21 (miR-21), a known driver of breast cancer. To ensure selectivity toward miR-21, we performed counter-screens against miR-122 and DICER. Auxiliary models were used to evaluate toxicity and rank the candidates. Learning from various datasets, we screened a pool of nine million molecules and identified eight, three of which showed anti-miR-21 activity in both reporter assays and RNA sequencing experiments. Target selectivity of these compounds was assessed using microRNA profiling and RNA sequencing analysis. The top candidate was tested in a xenograft mouse model of breast cancer metastasis, demonstrating a significant reduction in lung metastases. These results demonstrate RiboStrike's ability to nominate compounds that target the activity of miRNAs in cancer.
Collapse
Affiliation(s)
- Arash Keshavarzi Arshadi
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Milad Salem
- Department of Computer Engineering, University of Central Florida, Orlando, FL, USA
| | - Heather Karner
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Kristle Garcia
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Abolfazl Arab
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Jiann Shiun Yuan
- Department of Computer Engineering, University of Central Florida, Orlando, FL, USA
| | - Hani Goodarzi
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
8
|
Nagayasu K. Integrative Research of Neuropharmacology and Informatics Pharmacology for Mental Disorder. Biol Pharm Bull 2024; 47:556-561. [PMID: 38432911 DOI: 10.1248/bpb.b23-00926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Mental illness poses a huge social burden, accounting for approximately 14% of all deaths. Depression, a major component of mental illness, affects approximately 300 million people worldwide, mainly in developed countries, and is not only a major social burden but also a cause of suicide. The social burden of depression is estimated to increase further in developing countries, and overcoming it is a pressing issue for all countries, including Japan. Although clinical evidence has demonstrated the efficacy of serotonergic neurotransmission enhancers in the treatment of depression, the full picture of their therapeutic effects has not yet been fully elucidated. In this review, we show that the hyperactivity of serotonin neurons, especially those in the dorsal raphe nucleus, is commonly induced by various antidepressants within a period corresponding to the onset of their clinical efficacy. We established quantitative prediction methods for pharmacological activity using only chemical structures to translate the biological understanding of mental disorders, including major depressive disorders, into clinically effective therapeutics. Our method exhibited better performance than the previously reported methods of quantitative prediction, while targeting a larger number of proteins. Our article suggests the importance of integrative neuropharmacology and informatics-based pharmacology studies to understand the biological basis of mental disorders and facilitate drug development for these disorders.
Collapse
Affiliation(s)
- Kazuki Nagayasu
- Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University
| |
Collapse
|
9
|
Cheng Z, Hwang SS, Bhave M, Rahman T, Chee Wezen X. Combination of QSAR Modeling and Hybrid-Based Consensus Scoring to Identify Dual-Targeting Inhibitors of PLK1 and p38γ. J Chem Inf Model 2023; 63:6912-6924. [PMID: 37883148 DOI: 10.1021/acs.jcim.3c01252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
Polo-like kinase 1 (PLK1) and p38γ mitogen-activated protein kinase (p38γ) play important roles in cancer pathogenesis by controlling cell cycle progression and are therefore attractive cancer targets. The design of multitarget inhibitors may offer synergistic inhibition of distinct targets and reduce the risk of drug-drug interactions to improve the balance between therapeutic efficacy and safety. We combined deep-learning-based quantitative structure-activity relationship (QSAR) modeling and hybrid-based consensus scoring to screen for inhibitors with potential activity against the targeted proteins. Using this combination strategy, we identified a potent PLK1 inhibitor (compound 4) that inhibited PLK1 activity and liver cancer cell growth in the nanomolar range. Next, we deployed both our QSAR models for PLK1 and p38γ on the Enamine compound library to identify dual-targeting inhibitors against PLK1 and p38γ. Likewise, the identified hits were subsequently subjected to hybrid-based consensus scoring. Using this method, we identified a promising compound (compound 14) that could inhibit both PLK1 and p38γ activities. At nanomolar concentrations, compound 14 inhibited the growth of human hepatocellular carcinoma and hepatoblastoma cells in vitro. This study demonstrates the combined screening strategy to identify novel potential inhibitors for existing targets.
Collapse
Affiliation(s)
- Zixuan Cheng
- School of Engineering and Science, Swinburne University of Technology Sarawak, 93350 Kuching, Malaysia
| | - Siaw San Hwang
- School of Engineering and Science, Swinburne University of Technology Sarawak, 93350 Kuching, Malaysia
| | - Mrinal Bhave
- Department of Chemistry and Biotechnology, Swinburne University of Technology, Melbourne 3122, Victoria, Australia
| | - Taufiq Rahman
- Department of Pharmacology, University of Cambridge, Cambridge CB2 1PD, U.K
| | - Xavier Chee Wezen
- School of Engineering and Science, Swinburne University of Technology Sarawak, 93350 Kuching, Malaysia
| |
Collapse
|
10
|
Lee M, Min K. AmorProt: Amino Acid Molecular Fingerprints Repurposing-Based Protein Fingerprint. Biochemistry 2023; 62:2700-2709. [PMID: 37622182 DOI: 10.1021/acs.biochem.3c00253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/26/2023]
Abstract
As protein therapeutics play an important role in almost all medical fields, numerous studies have been conducted on proteins using artificial intelligence. Artificial intelligence has enabled data-driven predictions without the need for expensive experiments. Nevertheless, unlike the various molecular fingerprint algorithms that have been developed, protein fingerprint algorithms have rarely been studied. In this study, we proposed the amino acid molecular fingerprints repurposing-based protein (AmorProt) fingerprint, a protein sequence representation method that effectively uses the molecular fingerprints corresponding to 20 amino acids. Subsequently, the performances of the tree-based machine learning and artificial neural network models were compared using (1) amyloid classification and (2) isoelectric point regression. Finally, the applicability and advantages of the developed platform were demonstrated through a case study and the following experiments: (3) comparison of dataset dependence with feature-based methods, (4) feature importance analysis, and (5) protein space analysis. Consequently, the significantly improved model performance and data-set-independent versatility of the AmorProt fingerprint were verified. The results revealed that the current protein representation method can be applied to various fields related to proteins, such as predicting their fundamental properties or interaction with ligands.
Collapse
Affiliation(s)
- Myeonghun Lee
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| | - Kyoungmin Min
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| |
Collapse
|
11
|
Xiaolin X, Xiaozhi L, Guoping H, Hongwei L, Jinkuo G, Xiyun B, Zhen T, Xiaofang M, Yanxia L, Na X, Chunyan Z, Rui G, Kuan W, Cheng Z, Cuancuan W, Mingyong L, Xinping D. Overfit deep neural network for predicting drug-target interactions. iScience 2023; 26:107646. [PMID: 37680476 PMCID: PMC10480310 DOI: 10.1016/j.isci.2023.107646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 06/28/2023] [Accepted: 08/11/2023] [Indexed: 09/09/2023] Open
Abstract
Drug-target interactions (DTIs) prediction is an important step in drug discovery. As traditional biological experiments or high-throughput screening are high cost and time-consuming, many deep learning models have been developed. Overfitting must be avoided when training deep learning models. We propose a simple framework, called OverfitDTI, for DTI prediction. In OverfitDTI, a deep neural network (DNN) model is overfit to sufficiently learn the features of the chemical space of drugs and the biological space of targets. The weights of trained DNN model form an implicit representation of the nonlinear relationship between drugs and targets. Performance of OverfitDTI on three public datasets showed that the overfit DNN models fit the nonlinear relationship with high accuracy. We identified fifteen compounds that interacted with TEK, a receptor tyrosine kinase contributing to vascular homeostasis, and the predicted AT9283 and dorsomorphin were experimentally demonstrated as inhibitors of TEK in human umbilical vein endothelial cells (HUVECs).
Collapse
Affiliation(s)
- Xiao Xiaolin
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Liu Xiaozhi
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - He Guoping
- Geriatrics Department, Traditional Chinese Medicine Hospital of Binhai New Area, Tianjin, China
| | - Liu Hongwei
- School of Clinical Medicine, North China University of Science and Technology, Tangshan, Hebei, China
- Department of Anesthesiology, Tangshan Maternal and Child Health Hospital, Tangshan, Hebei, China
| | - Guo Jinkuo
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- College of Food Science and Engineering, Tianjin University of Science & Technology, Tianjin, China
| | - Bian Xiyun
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Tian Zhen
- Deepwater Technology Research Institute, China National Offshore Oil Corporation, Tianjin, China
| | - Ma Xiaofang
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Li Yanxia
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Xue Na
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Zhang Chunyan
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Gao Rui
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
| | - Wang Kuan
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Zhang Cheng
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Wang Cuancuan
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Liu Mingyong
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Department of Urology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Du Xinping
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- College of Food Science and Engineering, Tianjin University of Science & Technology, Tianjin, China
| |
Collapse
|
12
|
Ashraf FB, Akter S, Mumu SH, Islam MU, Uddin J. Bio-activity prediction of drug candidate compounds targeting SARS-Cov-2 using machine learning approaches. PLoS One 2023; 18:e0288053. [PMID: 37669264 PMCID: PMC10479925 DOI: 10.1371/journal.pone.0288053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 06/18/2023] [Indexed: 09/07/2023] Open
Abstract
The SARS-CoV-2 3CLpro protein is one of the key therapeutic targets of interest for COVID-19 due to its critical role in viral replication, various high-quality protein crystal structures, and as a basis for computationally screening for compounds with improved inhibitory activity, bioavailability, and ADMETox properties. The ChEMBL and PubChem database contains experimental data from screening small molecules against SARS-CoV-2 3CLpro, which expands the opportunity to learn the pattern and design a computational model that can predict the potency of any drug compound against coronavirus before in-vitro and in-vivo testing. In this study, Utilizing several descriptors, we evaluated 27 machine learning classifiers. We also developed a neural network model that can correctly identify bioactive and inactive chemicals with 91% accuracy, on CheMBL data and 93% accuracy on combined data on both CheMBL and Pubchem. The F1-score for inactive and active compounds was 93% and 94%, respectively. SHAP (SHapley Additive exPlanations) on XGB classifier to find important fingerprints from the PaDEL descriptors for this task. The results indicated that the PaDEL descriptors were effective in predicting bioactivity, the proposed neural network design was efficient, and the Explanatory factor through SHAP correctly identified the important fingertips. In addition, we validated the effectiveness of our proposed model using a large dataset encompassing over 100,000 molecules. This research employed various molecular descriptors to discover the optimal one for this task. To evaluate the effectiveness of these possible medications against SARS-CoV-2, more in-vitro and in-vivo research is required.
Collapse
Affiliation(s)
- Faisal Bin Ashraf
- Department of Computer Science and Engineering, Brac University, Dhaka, Bangladesh
- Department of Computer Science and Engineering, University of California, Riverside, California, United States of America
| | - Sanjida Akter
- Department of Cell Molecular and Developmental Biology, University of California, Riverside, California, United States of America
| | - Sumona Hoque Mumu
- School of Kinesiology, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Muhammad Usama Islam
- School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Jasim Uddin
- Department of Applied Computing and Engineering, Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, Wales, United Kingdom
| |
Collapse
|
13
|
Sinha K, Ghosh N, Sil PC. A Review on the Recent Applications of Deep Learning in Predictive Drug Toxicological Studies. Chem Res Toxicol 2023; 36:1174-1205. [PMID: 37561655 DOI: 10.1021/acs.chemrestox.2c00375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Drug toxicity prediction is an important step in ensuring patient safety during drug design studies. While traditional preclinical studies have historically relied on animal models to evaluate toxicity, recent advances in deep-learning approaches have shown great promise in advancing drug safety science and reducing animal use in preclinical studies. However, deep-learning-based approaches also face challenges in handling large biological data sets, model interpretability, and regulatory acceptance. In this review, we provide an overview of recent developments in deep-learning-based approaches for predicting drug toxicity, highlighting their potential advantages over traditional methods and the need to address their limitations. Deep-learning models have demonstrated excellent performance in predicting toxicity outcomes from various data sources such as chemical structures, genomic data, and high-throughput screening assays. The potential of deep learning for automated feature engineering is also discussed. This review emphasizes the need to address ethical concerns related to the use of deep learning in drug toxicity studies, including the reduction of animal use and ensuring regulatory acceptance. Furthermore, emerging applications of deep learning in drug toxicity prediction, such as predicting drug-drug interactions and toxicity in rare subpopulations, are highlighted. The integration of deep-learning-based approaches with traditional methods is discussed as a way to develop more reliable and efficient predictive models for drug safety assessment, paving the way for safer and more effective drug discovery and development. Overall, this review highlights the critical role of deep learning in predictive toxicology and drug safety evaluation, emphasizing the need for continued research and development in this rapidly evolving field. By addressing the limitations of traditional methods, leveraging the potential of deep learning for automated feature engineering, and addressing ethical concerns, deep-learning-based approaches have the potential to revolutionize drug toxicity prediction and improve patient safety in drug discovery and development.
Collapse
Affiliation(s)
- Krishnendu Sinha
- Department of Zoology, Jhargram Raj College, Jhargram 721507, West Bengal, India
| | - Nabanita Ghosh
- Department of Zoology, Maulana Azad College, Kolkata 700013, West Bengal, India
| | - Parames C Sil
- Division of Molecular Medicine, Bose Institute, Kolkata 700054, West Bengal, India
| |
Collapse
|
14
|
Zadorozhny A, Smirnov A, Filimonov D, Lagunin A. Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors. Bioinformatics 2023; 39:btad484. [PMID: 37535750 PMCID: PMC10435372 DOI: 10.1093/bioinformatics/btad484] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 05/21/2023] [Accepted: 08/02/2023] [Indexed: 08/05/2023] Open
Abstract
MOTIVATION Next Generation Sequencing technologies make it possible to detect rare genetic variants in individual patients. Currently, more than a dozen software and web services have been created to predict the pathogenicity of variants related with changing of amino acid residues. Despite considerable efforts in this area, at the moment there is no ideal method to classify pathogenic and harmless variants, and the assessment of the pathogenicity is often contradictory. In this article, we propose to use peptides structural formulas of proteins as an amino acid residues substitutions description, rather than a single-letter code. This allowed us to investigate the effectiveness of chemoinformatics approach to assess the pathogenicity of variants associated with amino acid substitutions. RESULTS The structure-activity relationships analysis relying on protein-specific data and atom centric substructural multilevel neighborhoods of atoms (MNA) descriptors of molecular fragments appeared to be suitable for predicting the pathogenic effect of single amino acid variants. MNA-based Naïve Bayes classifier algorithm, ClinVar and humsavar data were used for the creation of structure-activity relationships models for 10 proteins. The performance of the models was compared with 11 different predicting tools: 8 individual (SIFT 4G, Polyphen2 HDIV, MutationAssessor, PROVEAN, FATHMM, MVP, LIST-S2, MutPred) and 3 consensus (M-CAP, MetaSVM, MetaLR). The accuracy of MNA-based method varies for the proteins (AUC: 0.631-0.993; MCC: 0.191-0.891). It was similar for both the results of comparisons with the other individual predictors and third-party protein-specific predictors. For several proteins (BRCA1, BRCA2, COL1A2, and RYR1), the performance of the MNA-based method was outstanding, capable of capturing the pathogenic effect of structural changes in amino acid substitutions. AVAILABILITY AND IMPLEMENTATION The datasets are available as supplemental data at Bioinformatics online. A python script to convert amino acid and nucleotide sequences from single-letter codes to SD files is available at https://github.com/SmirnygaTotoshka/SequenceToSDF. The authors provide trial licenses for MultiPASS software to interested readers upon request.
Collapse
Affiliation(s)
- Anton Zadorozhny
- Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow 117513, Russia
| | - Anton Smirnov
- Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow 117513, Russia
| | - Dmitry Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119992, Russia
| | - Alexey Lagunin
- Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow 117513, Russia
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119992, Russia
| |
Collapse
|
15
|
Park H, Hong S, Lee M, Kang S, Brahma R, Cho KH, Shin JM. AiKPro: deep learning model for kinome-wide bioactivity profiling using structure-based sequence alignments and molecular 3D conformer ensemble descriptors. Sci Rep 2023; 13:10268. [PMID: 37355672 PMCID: PMC10290719 DOI: 10.1038/s41598-023-37456-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/22/2023] [Indexed: 06/26/2023] Open
Abstract
The discovery of selective and potent kinase inhibitors is crucial for the treatment of various diseases, but the process is challenging due to the high structural similarity among kinases. Efficient kinome-wide bioactivity profiling is essential for understanding kinase function and identifying selective inhibitors. In this study, we propose AiKPro, a deep learning model that combines structure-validated multiple sequence alignments and molecular 3D conformer ensemble descriptors to predict kinase-ligand binding affinities. Our deep learning model uses an attention-based mechanism to capture complex patterns in the interactions between the kinase and the ligand. To assess the performance of AiKPro, we evaluated the impact of descriptors, the predictability for untrained kinases and compounds, and kinase activity profiling based on odd ratios. Our model, AiKPro, shows good Pearson's correlation coefficients of 0.88 and 0.87 for the test set and for the untrained sets of compounds, respectively, which also shows the robustness of the model. AiKPro shows good kinase-activity profiles across the kinome, potentially facilitating the discovery of novel interactions and selective inhibitors. Our approach holds potential implications for the discovery of novel, selective kinase inhibitors and guiding rational drug design.
Collapse
Affiliation(s)
- Hyejin Park
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Sujeong Hong
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Myeonghun Lee
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Sungil Kang
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Rahul Brahma
- School of Systems Biomedical Science, Soongsil University, Seoul, Republic of Korea
| | - Kwang-Hwi Cho
- School of Systems Biomedical Science, Soongsil University, Seoul, Republic of Korea
| | - Jae-Min Shin
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
16
|
Arshadi AK, Salem M, Karner H, Garcia K, Arab A, Yuan JS, Goodarzi H. Functional microRNA-Targeting Drug Discovery by Graph-Based Deep Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.13.524005. [PMID: 36711761 PMCID: PMC9882104 DOI: 10.1101/2023.01.13.524005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
MicroRNAs are recognized as key drivers in many cancers, but targeting them with small molecules remains a challenge. We present RiboStrike, a deep learning framework that identifies small molecules against specific microRNAs. To demonstrate its capabilities, we applied it to microRNA-21 (miR-21), a known driver of breast cancer. To ensure the selected molecules only targeted miR-21 and not other microRNAs, we also performed a counter-screen against DICER, an enzyme involved in microRNA biogenesis. Additionally, we used auxiliary models to evaluate toxicity and select the best candidates. Using datasets from various sources, we screened a pool of nine million molecules and identified eight, three of which showed anti-miR-21 activity in both reporter assays and RNA sequencing experiments. One of these was also tested in mouse models of breast cancer, resulting in a significant reduction of lung metastases. These results demonstrate RiboStrike’s ability to effectively screen for microRNA-targeting compounds in cancer.
Collapse
|
17
|
Simple nearest-neighbour analysis meets the accuracy of compound potency predictions using complex machine learning models. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00581-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
18
|
Yang J, Li Z, Wu WKK, Yu S, Xu Z, Chu Q, Zhang Q. Deep learning identifies explainable reasoning paths of mechanism of action for drug repurposing from multilayer biological network. Brief Bioinform 2022; 23:6809964. [PMID: 36347526 DOI: 10.1093/bib/bbac469] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/07/2022] [Accepted: 09/29/2022] [Indexed: 11/11/2022] Open
Abstract
The discovery and repurposing of drugs require a deep understanding of the mechanism of drug action (MODA). Existing computational methods mainly model MODA with the protein-protein interaction (PPI) network. However, the molecular interactions of drugs in the human body are far beyond PPIs. Additionally, the lack of interpretability of these models hinders their practicability. We propose an interpretable deep learning-based path-reasoning framework (iDPath) for drug discovery and repurposing by capturing MODA on by far the most comprehensive multilayer biological network consisting of the complex high-dimensional molecular interactions between genes, proteins and chemicals. Experiments show that iDPath outperforms state-of-the-art machine learning methods on a general drug repurposing task. Further investigations demonstrate that iDPath can identify explicit critical paths that are consistent with clinical evidence. To demonstrate the practical value of iDPath, we apply it to the identification of potential drugs for treating prostate cancer and hypertension. Results show that iDPath can discover new FDA-approved drugs. This research provides a novel interpretable artificial intelligence perspective on drug discovery.
Collapse
Affiliation(s)
- Jiannan Yang
- School of Data Science, City University of Hong Kong, Hong Kong SAR, China
| | - Zhen Li
- Department of Radiology, Tongji Hospital, Huazhong University of Science and Technology, Wuhan, China
| | - William Ka Kei Wu
- Department of Anaesthesia and Intensive Care, Chinese University of Hong Kong, Hong Kong SAR, China
| | - Shi Yu
- The USC Norris Center for Cancer Drug Development, University of Southern California, Los Angeles, CA, USA.,Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zhongzhi Xu
- School of Data Science, City University of Hong Kong, Hong Kong SAR, China
| | - Qian Chu
- Department of Thoracic Oncology, Tongji Hospital, Huazhong University of Science and Technology, Wuhan, China
| | - Qingpeng Zhang
- School of Data Science, City University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
19
|
Liu L, Du Y, Wang Y, Cheung WK, Zhang Y, Liu Q, Wang G. LRP2A: Layer-wise Relevance Propagation based Adversarial attacking for Graph Neural Networks. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
20
|
Zhang J. Atom typing using graph representation learning: How do models learn chemistry? J Chem Phys 2022; 156:204108. [DOI: 10.1063/5.0095008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Atom typing is the first step for simulating molecules using a force field. Automatic atom typing for an arbitrary molecule is often realized by rule-based algorithms, which have to manually encode rules for all types defined in this force field. These are time-consuming and force field-specific. In this study, a method that is independent of a specific force field based on graph representation learning is established for automatic atom typing. The topology adaptive graph convolution network (TAGCN) is found to be an optimal model. The model does not need manual enumeration of rules but can learn the rules just through training using typed molecules prepared during the development of a force field. The test on the CHARMM general force field gives a typing correctness of 91%. A systematic error of typing by TAGCN is its inability of distinguishing types in rings or acyclic chains. It originates from the fundamental structure of graph neural networks and can be fixed in a trivial way. More importantly, analysis of the rationalization processes of these models using layer-wise relation propagation reveals how TAGCN encodes rules learned during training. Our model is found to be able to type using the local chemical environments, in a way highly in accordance with chemists’ intuition.
Collapse
Affiliation(s)
- Jun Zhang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, People’s Republic of China
| |
Collapse
|
21
|
Morger A, Garcia de Lomana M, Norinder U, Svensson F, Kirchmair J, Mathea M, Volkamer A. Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data. Sci Rep 2022; 12:7244. [PMID: 35508546 PMCID: PMC9068909 DOI: 10.1038/s41598-022-09309-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 03/17/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany
| | - Marina Garcia de Lomana
- BASF SE, 67056, Ludwigshafen, Germany
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, 751 24, Sweden
- Dept Computer and Systems Sciences, Stockholm University, Kista, 164 07, Sweden
- MTM Research Centre, School of Science and Technology, 701 82, Örebro, Sweden
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Johannes Kirchmair
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany.
| |
Collapse
|
22
|
Ligand-based approaches to activity prediction for the early stage of structure–activity–relationship progression. J Comput Aided Mol Des 2022; 36:237-252. [DOI: 10.1007/s10822-022-00449-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 03/07/2022] [Indexed: 11/27/2022]
|
23
|
Nakarin F, Boonpalit K, Kinchagawat J, Wachiraphan P, Rungrotmongkol T, Nutanong S. Assisting Multitargeted Ligand Affinity Prediction of Receptor Tyrosine Kinases Associated Nonsmall Cell Lung Cancer Treatment with Multitasking Principal Neighborhood Aggregation. Molecules 2022; 27:molecules27041226. [PMID: 35209011 PMCID: PMC8878292 DOI: 10.3390/molecules27041226] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 01/30/2022] [Accepted: 01/31/2022] [Indexed: 11/16/2022] Open
Abstract
A multitargeted therapeutic approach with hybrid drugs is a promising strategy to enhance anticancer efficiency and overcome drug resistance in nonsmall cell lung cancer (NSCLC) treatment. Estimating affinities of small molecules against targets of interest typically proceeds as a preliminary action for recent drug discovery in the pharmaceutical industry. In this investigation, we employed machine learning models to provide a computationally affordable means for computer-aided screening to accelerate the discovery of potential drug compounds. In particular, we introduced a quantitative structure–activity-relationship (QSAR)-based multitask learning model to facilitate an in silico screening system of multitargeted drug development. Our method combines a recently developed graph-based neural network architecture, principal neighborhood aggregation (PNA), with a descriptor-based deep neural network supporting synergistic utilization of molecular graph and fingerprint features. The model was generated by more than ten-thousands affinity-reported ligands of seven crucial receptor tyrosine kinases in NSCLC from two public data sources. As a result, our multitask model demonstrated better performance than all other benchmark models, as well as achieving satisfying predictive ability regarding applicable QSAR criteria for most tasks within the model’s applicability. Since our model could potentially be a screening tool for practical use, we have provided a model implementation platform with a tutorial that is freely accessible hence, advising the first move in a long journey of cancer drug development.
Collapse
Affiliation(s)
- Fahsai Nakarin
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
- Correspondence: ; Tel.: +66-33-014-444
| | - Kajjana Boonpalit
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| | - Jiramet Kinchagawat
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| | - Patcharapol Wachiraphan
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| | - Thanyada Rungrotmongkol
- Center of Excellence in Biocatalyst and Sustainable Biotechnology, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand;
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand
| | - Sarana Nutanong
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| |
Collapse
|
24
|
Alves LA, Ferreira NCDS, Maricato V, Alberto AVP, Dias EA, Jose Aguiar Coelho N. Graph Neural Networks as a Potential Tool in Improving Virtual Screening Programs. Front Chem 2022; 9:787194. [PMID: 35127645 PMCID: PMC8811035 DOI: 10.3389/fchem.2021.787194] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 12/10/2021] [Indexed: 11/23/2022] Open
Abstract
Despite the increasing number of pharmaceutical companies, university laboratories and funding, less than one percent of initially researched drugs enter the commercial market. In this context, virtual screening (VS) has gained much attention due to several advantages, including timesaving, reduced reagent and consumable costs and the performance of selective analyses regarding the affinity between test molecules and pharmacological targets. Currently, VS is based mainly on algorithms that apply physical and chemistry principles and quantum mechanics to estimate molecule affinities and conformations, among others. Nevertheless, VS has not reached the expected results concerning the improvement of market-approved drugs, comprising less than twenty drugs that have reached this goal to date. In this context, graph neural networks (GNN), a recent deep-learning subtype, may comprise a powerful tool to improve VS results concerning natural products that may be used both simultaneously with standard algorithms or isolated. This review discusses the pros and cons of GNN applied to VS and the future perspectives of this learnable algorithm, which may revolutionize drug discovery if certain obstacles concerning spatial coordinates and adequate datasets, among others, can be overcome.
Collapse
Affiliation(s)
- Luiz Anastacio Alves
- Laboratory of Cellular Communication, Oswaldo Cruz Institute – Fiocruz, Rio de Janeiro, Brazil
| | | | - Victor Maricato
- Laboratory of Cellular Communication, Oswaldo Cruz Institute – Fiocruz, Rio de Janeiro, Brazil
| | | | - Evellyn Araujo Dias
- Laboratory of Cellular Communication, Oswaldo Cruz Institute – Fiocruz, Rio de Janeiro, Brazil
| | - Nt Jose Aguiar Coelho
- National Institute of Industrial Property - INPI and Veiga de Almeida University - UVA, Rio de Janeiro, Brazil
| |
Collapse
|
25
|
Ren S, Tao Y, Yu K, Xue Y, Schwartz R, Lu X. De novo Prediction of Cell-Drug Sensitivities Using Deep Learning-based Graph Regularized Matrix Factorization. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2022; 27:278-289. [PMID: 34890156 PMCID: PMC8691529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Application of artificial intelligence (AI) in precision oncology typically involves predicting whether the cancer cells of a patient (previously unseen by AI models) will respond to any of a set of existing anticancer drugs, based on responses of previous training cell samples to those drugs. To expand the repertoire of anticancer drugs, AI has also been used to repurpose drugs that have not been tested in an anticancer setting, i.e., predicting the anticancer effects of a new drug on previously unseen cancer cells de novo. Here, we report a computational model that addresses both of the above tasks in a unified AI framework. Our model, referred to as deep learning-based graph regularized matrix factorization (DeepGRMF), integrates neural networks, graph models, and matrix-factorization techniques to utilize diverse information from drug chemical structures, their impact on cellular signaling systems, and cancer cell cellular states to predict cell response to drugs. DeepGRMF learns embeddings of drugs so that drugs sharing similar structures and mechanisms of action (MOAs) are closely related in the embedding space. Similarly, DeepGRMF also learns representation embeddings of cells such that cells sharing similar cellular states and drug responses are closely related. Evaluation of DeepGRMF and competing models on Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets show its superiority in prediction performance. Finally, we show that the model is capable of predicting effectiveness of a chemotherapy regimen on patient outcomes for the lung cancer patients in The Cancer Genome Atlas (TCGA) dataset*.
Collapse
|
26
|
Varela-Rial A, Maryanow I, Majewski M, Doerr S, Schapin N, Jiménez-Luna J, De Fabritiis G. PlayMolecule Glimpse: Understanding Protein-Ligand Property Predictions with Interpretable Neural Networks. J Chem Inf Model 2022; 62:225-231. [PMID: 34978201 PMCID: PMC8790755 DOI: 10.1021/acs.jcim.1c00691] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
![]()
Deep learning has
been successfully applied to structure-based
protein–ligand affinity prediction, yet the black box nature
of these models raises some questions. In a previous study, we presented
KDEEP, a convolutional neural network that predicted the
binding affinity of a given protein–ligand complex while reaching
state-of-the-art performance. However, it was unclear what this model
was learning. In this work, we present a new application to visualize
the contribution of each input atom to the prediction made by the
convolutional neural network, aiding in the interpretability of such
predictions. The results suggest that KDEEP is able to
learn meaningful chemistry signals from the data, but it has also
exposed the inaccuracies of the current model, serving as a guideline
for further optimization of our prediction tools.
Collapse
Affiliation(s)
- Alejandro Varela-Rial
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain.,Acellera Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | - Iain Maryanow
- Acellera Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | - Maciej Majewski
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Stefan Doerr
- Acellera Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | - Nikolai Schapin
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain.,Acellera Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | - José Jiménez-Luna
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain.,Acellera Labs, Doctor Trueta 183, 08005 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
27
|
Nagayasu K. Serotonin transporter: Recent progress of in silico ligand prediction methods and structural biology towards structure-guided in silico design of therapeutic agents. J Pharmacol Sci 2022; 148:295-299. [DOI: 10.1016/j.jphs.2022.01.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 12/21/2021] [Accepted: 01/06/2022] [Indexed: 02/08/2023] Open
|
28
|
Velloso JPL, Ascher DB, Pires DEV. pdCSM-GPCR: predicting potent GPCR ligands with graph-based signatures. BIOINFORMATICS ADVANCES 2021; 1:vbab031. [PMID: 34901870 PMCID: PMC8651072 DOI: 10.1093/bioadv/vbab031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 09/30/2021] [Accepted: 11/02/2021] [Indexed: 01/26/2023]
Abstract
MOTIVATION G protein-coupled receptors (GPCRs) can selectively bind to many types of ligands, ranging from light-sensitive compounds, ions, hormones, pheromones and neurotransmitters, modulating cell physiology. Considering their role in many essential cellular processes, they are one of the most targeted protein families, with over a third of all approved drugs modulating GPCR signalling. Despite this, the large diversity of receptors and their multipass transmembrane architectures make the identification and development of novel specific, and safe GPCR ligands a challenge. While computational approaches have the potential to assist GPCR drug development, they have presented limited performance and generalization capabilities. Here, we explored the use of graph-based signatures to develop pdCSM-GPCR, a method capable of rapidly and accurately screening potential GPCR ligands. RESULTS Bioactivity data (IC50, EC50, Ki and Kd) for individual GPCRs were curated. After curation, we used the data for developing predictive models for 36 major GPCR targets, across 4 classes (A, B, C and F). Our models compose the most comprehensive computational resource for GPCR bioactivity prediction to date. Across stratified 10-fold cross-validation and blind tests, our approach achieved Pearson's correlations of up to 0.89, significantly outperforming previous methods. Interpreting our results, we identified common important features of potent GPCRs ligands, which tend to have bicyclic rings, leading to higher levels of aromaticity. We believe pdCSM-GPCR will be an invaluable tool to assist screening efforts, enriching compound libraries and ranking candidates for further experimental validation. AVAILABILITY AND IMPLEMENTATION pdCSM-GPCR predictive models and datasets used have been made available via a freely accessible and easy-to-use web server at http://biosig.unimelb.edu.au/pdcsm_gpcr/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- João Paulo L Velloso
- Fundação Oswaldo Cruz, Instituto René Rachou, Belo Horizonte 30190-009, Brazil
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia
- Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia
- Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Melbourne 3052, Australia
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne 3053, Australia
| |
Collapse
|
29
|
Sahin K, Saripinar E, Durdagi S. Combined 4D-QSAR and target-based approaches for the determination of bioactive Isatin derivatives. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:769-792. [PMID: 34530651 DOI: 10.1080/1062936x.2021.1971760] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 08/19/2021] [Indexed: 06/13/2023]
Abstract
The hybrid method of the Electron-Conformational Genetic Algorithm (EC-GA) was used to determine the pharmacophore groups and to estimate anticancer activity in isatin derivatives using a robust 4D-QSAR software (EMRE). To build the model, each compound is represented by a set of conformers rather than a single conformation. The Electron Conformational Matrix of Congruity (ECMC) is composed via EMRE software. Electron Conformational Submatrix of Activity (ECSA) was calculated by the comparison of these matrices. Genetic algorithm was used to select important variables to predict theoretical activity. The model with the best seven parameters produced satisfactory results. The E statistics technique was applied to the generated EC-GA model to evaluate the individual contribution of each of the descriptors on biological activity. The r2 and q2 values of the training set compounds were found to be 0.95 and 0.93, respectively. Because no previous 4D-QSAR studies on isatin derivatives have been conducted, this study is important in the development of new isatin derivatives. In this study, 27 isatin derivatives whose activities were estimated using the hybrid EC-GA method were also investigated through molecular docking and molecular dynamics simulations for their BCL-2 inhibitory activity.
Collapse
Affiliation(s)
- K Sahin
- Computational Biology and Molecular Simulations Laboratory, Department of Biophysics, School of Medicine, Bahcesehir University, Istanbul, Turkey
| | - E Saripinar
- Faculty of Science, Department of Chemistry, Erciyes University, Kayseri, Turkey
| | - S Durdagi
- Computational Biology and Molecular Simulations Laboratory, Department of Biophysics, School of Medicine, Bahcesehir University, Istanbul, Turkey
| |
Collapse
|
30
|
Yin J, Li F, Li Z, Yu L, Zhu F, Zeng S. Feature, Function, and Information of Drug Transporter Related Databases. Drug Metab Dispos 2021; 50:76-85. [PMID: 34426411 DOI: 10.1124/dmd.121.000419] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 08/20/2021] [Indexed: 11/22/2022] Open
Abstract
With the rapid progress in pharmaceutical experiments and clinical investigations, extensive knowledge of drug transporters (DTs) has accumulated, which is valuable data for the understanding of drug metabolism and disposition. However, such data is largely dispersed in the literature, which hampers its utility and significantly limits its possibility for comprehensive analysis. A variety of databases have, therefore, been constructed to provide DT-related data, and they were reviewed in this study. First, several knowledge bases providing data regarding clinically important drugs and their corresponding transporters were discussed, which constituted the most important resources of DT-centered data. Second, some databases describing the general transporters and their functional families were reviewed. Third, various databases offering transporter information as part of their entire data collection were described. Finally, customized database functions that are available to facilitate DT-related research were discussed. This review provided an overview of the whole collection of DT-related databases, which might facilitate research on precision medicine and rational drug use. Significance Statement A collection of well-established databases related to DTs were comprehensively reviewed, which were organized according to their importance in drug ADME research. These databases could collectively contribute to the research on rational drug use.
Collapse
Affiliation(s)
- Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Zhaorong Li
- Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, China
| | | | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Su Zeng
- College of Pharmaceutical Sciences, Zhejiang University, China
| |
Collapse
|
31
|
Sun M, Xing J, Wang H, Chen B, Zhou J. MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph. KDD : PROCEEDINGS. INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING 2021; 2021:3585-3594. [PMID: 35571558 PMCID: PMC9105980 DOI: 10.1145/3447548.3467186] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Recent years have seen a rapid growth of utilizing graph neural networks (GNNs) in the biomedical domain for tackling drug-related problems. However, like any other deep architectures, GNNs are data hungry. While requiring labels in real world is often expensive, pretraining GNNs in an unsupervised manner has been actively explored. Among them, graph contrastive learning, by maximizing the mutual information between paired graph augmentations, has been shown to be effective on various downstream tasks. However, the current graph contrastive learning framework has two limitations. First, the augmentations are designed for general graphs and thus may not be suitable or powerful enough for certain domains. Second, the contrastive scheme only learns representations that are invariant to local perturbations and thus does not consider the global structure of the dataset, which may also be useful for downstream tasks. In this paper, we study graph contrastive learning designed specifically for the biomedical domain, where molecular graphs are present. We propose a novel framework called MoCL, which utilizes domain knowledge at both local- and global-level to assist representation learning. The local-level domain knowledge guides the augmentation process such that variation is introduced without changing graph semantics. The global-level knowledge encodes the similarity information between graphs in the entire dataset and helps to learn representations with richer semantics. The entire model is learned through a double contrast objective. We evaluate MoCL on various molecular datasets under both linear and semi-supervised settings and results show that MoCL achieves state-of-the-art performance.
Collapse
Affiliation(s)
- Mengying Sun
- Michigan State University, East Lansing, Michigan, USA
| | - Jing Xing
- Michigan State University, Grand Rapids, Michigan, USA
| | - Huijun Wang
- Agios Pharmaceuticals, Cambridge, Massachusetts, USA
| | - Bin Chen
- Michigan State University, Grand Rapids, Michigan, USA
| | - Jiayu Zhou
- Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|