1
|
Gomatam A, Hirlekar BU, Singh KD, Murty US, Dixit VA. Improved QSAR models for PARP-1 inhibition using data balancing, interpretable machine learning, and matched molecular pair analysis. Mol Divers 2024; 28:2135-2152. [PMID: 38374474 DOI: 10.1007/s11030-024-10809-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/07/2024] [Indexed: 02/21/2024]
Abstract
The poly (ADP-ribose) polymerase-1 (PARP-1) enzyme is an important target in the treatment of breast cancer. Currently, treatment options include the drugs Olaparib, Niraparib, Rucaparib, and Talazoparib; however, these drugs can cause severe side effects including hematological toxicity and cardiotoxicity. Although in silico models for the prediction of PARP-1 activity have been developed, the drawbacks of these models include low specificity, a narrow applicability domain, and a lack of interpretability. To address these issues, a comprehensive machine learning (ML)-based quantitative structure-activity relationship (QSAR) approach for the informed prediction of PARP-1 activity is presented. Classification models built using the Synthetic Minority Oversampling Technique (SMOTE) for data balancing gave robust and predictive models based on the K-nearest neighbor algorithm (accuracy 0.86, sensitivity 0.88, specificity 0.80). Regression models were built on structurally congeneric datasets, with the models for the phthalazinone class and fused cyclic compounds giving the best performance. In accordance with the Organization for Economic Cooperation and Development (OECD) guidelines, a mechanistic interpretation is proposed using the Shapley Additive Explanations (SHAP) to identify the important topological features to differentiate between PARP-1 actives and inactives. Moreover, an analysis of the PARP-1 dataset revealed the prevalence of activity cliffs, which possibly negatively impacts the model's predictive performance. Finally, a set of chemical transformation rules were extracted using the matched molecular pair analysis (MMPA) which provided mechanistic insights and can guide medicinal chemists in the design of novel PARP-1 inhibitors.
Collapse
Affiliation(s)
- Anish Gomatam
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, (NIPER Guwahati), Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), Dist: Kamrup, P.O.: Changsari, Guwahati, Assam, 781101, India
| | - Bhakti Umesh Hirlekar
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, (NIPER Guwahati), Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), Dist: Kamrup, P.O.: Changsari, Guwahati, Assam, 781101, India
| | - Krishan Dev Singh
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, (NIPER Guwahati), Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), Dist: Kamrup, P.O.: Changsari, Guwahati, Assam, 781101, India
| | - Upadhyayula Suryanarayana Murty
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, (NIPER Guwahati), Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), Dist: Kamrup, P.O.: Changsari, Guwahati, Assam, 781101, India
| | - Vaibhav A Dixit
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, (NIPER Guwahati), Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), Dist: Kamrup, P.O.: Changsari, Guwahati, Assam, 781101, India.
| |
Collapse
|
2
|
Hao Y, Li B, Huang D, Wu S, Wang T, Fu L, Liu X. Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery. Int J Mol Sci 2024; 25:8239. [PMID: 39125808 PMCID: PMC11312053 DOI: 10.3390/ijms25158239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 07/26/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.
Collapse
Affiliation(s)
- Yang Hao
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Bo Li
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Daiyun Huang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- School of Life Sciences, Fudan University, Shanghai 200092, China
| | - Sijin Wu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| | - Tianjun Wang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Lei Fu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| | - Xin Liu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| |
Collapse
|
3
|
Pu C, Gu L, Hu Y, Han W, Xu X, Liu H, Chen Y, Zhang Y. Prediction of Human Liver Microsome Clearance with Chirality-Focused Graph Neural Networks. J Chem Inf Model 2024; 64:5427-5438. [PMID: 38976447 DOI: 10.1021/acs.jcim.4c00243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
In drug candidate design, clearance is one of the most crucial pharmacokinetic parameters to consider. Recent advancements in machine learning techniques coupled with the growing accumulation of drug data have paved the way for the construction of computational models to predict drug clearance. However, concerns persist regarding the reliability of data collected from public sources, and a majority of current in silico quantitative structure-property relationship models tend to neglect the influence of molecular chirality. In this study, we meticulously examined human liver microsome (HLM) data from public databases and constructed two distinct data sets with varying HLM data quantity and quality. Two baseline models (RF and DNN) and three chirality-focused GNNs (DMPNN, TetraDMPNN, and ChIRo) were proposed, and their performance on HLM data was evaluated and compared with each other. The TetraDMPNN model, which leverages chirality from 2D structure, exhibited the best performance with a test R2 of 0.639 and a test root-mean-squared error of 0.429. The applicability domain of the model was also defined by using a molecular similarity-based method. Our research indicates that graph neural networks capable of capturing molecular chirality have significant potential for practical application and can deliver superior performance.
Collapse
Affiliation(s)
- Chengtao Pu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lingxi Gu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yuxuan Hu
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Weijie Han
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xiaohe Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|
4
|
Wang L, Zhou Z, Yang X, Shi S, Zeng X, Cao D. The present state and challenges of active learning in drug discovery. Drug Discov Today 2024; 29:103985. [PMID: 38642700 DOI: 10.1016/j.drudis.2024.103985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/08/2024] [Accepted: 04/15/2024] [Indexed: 04/22/2024]
Abstract
Active learning (AL) is an iterative feedback process that efficiently identifies valuable data within vast chemical space, even with limited labeled data. This characteristic renders it a valuable approach to tackle the ongoing challenges faced in drug discovery, such as the ever-expanding explore space and the limitations of labeled data. Consequently, AL is increasingly gaining prominence in the field of drug development. In this paper, we comprehensively review the application of AL at all stages of drug discovery, including compounds-target interaction prediction, virtual screening, molecular generation and optimization, as well as molecular properties prediction. Additionally, we discuss the challenges and prospects associated with the current applications of AL in drug discovery.
Collapse
Affiliation(s)
- Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Zhenran Zhou
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Xixi Yang
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Shaohua Shi
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China.
| |
Collapse
|
5
|
Boldini D, Friedrich L, Kuhn D, Sieber SA. Machine Learning Assisted Hit Prioritization for High Throughput Screening in Drug Discovery. ACS CENTRAL SCIENCE 2024; 10:823-832. [PMID: 38680560 PMCID: PMC11046457 DOI: 10.1021/acscentsci.3c01517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/01/2024] [Accepted: 03/01/2024] [Indexed: 05/01/2024]
Abstract
Efficient prioritization of bioactive compounds from high throughput screening campaigns is a fundamental challenge for accelerating drug development efforts. In this study, we present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. By analyzing the learning dynamics during training of a gradient boosting model on noisy high throughput screening data using a novel formulation of sample influence, we are able to distinguish between compounds exhibiting the desired biological response and those producing assay artifacts. Therefore, our method enables false positive and true positive detection without relying on prior screens or assay interference mechanisms, making it applicable to any high throughput screening campaign. We demonstrate that our approach consistently excludes assay interferents with different mechanisms and prioritizes biologically relevant compounds more efficiently than all tested baselines, including a retrospective case study simulating its use in a real drug discovery campaign. Finally, our tool is extremely computationally efficient, requiring less than 30 s per assay on low-resource hardware. As such, our findings show that our method is an ideal addition to existing false positive detection tools and can be used to guide further pharmacological optimization after high throughput screening campaigns.
Collapse
Affiliation(s)
- Davide Boldini
- TUM
School of Natural Sciences, Department of Bioscience, Center for Functional
Protein Assemblies (CPA), Technical University
of Munich, 85748 Garching bei München, Germany
| | - Lukas Friedrich
- The
Healthcare business of Merck KGaA, 64293 Darmstadt, Germany
| | - Daniel Kuhn
- The
Healthcare business of Merck KGaA, 64293 Darmstadt, Germany
| | - Stephan A. Sieber
- TUM
School of Natural Sciences, Department of Bioscience, Center for Functional
Protein Assemblies (CPA), Technical University
of Munich, 85748 Garching bei München, Germany
| |
Collapse
|
6
|
Meewan I, Panmanee J, Petchyam N, Lertvilai P. HBCVTr: an end-to-end transformer with a deep neural network hybrid model for anti-HBV and HCV activity predictor from SMILES. Sci Rep 2024; 14:9262. [PMID: 38649402 PMCID: PMC11035669 DOI: 10.1038/s41598-024-59933-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 04/16/2024] [Indexed: 04/25/2024] Open
Abstract
Hepatitis B and C viruses (HBV and HCV) are significant causes of chronic liver diseases, with approximately 350 million infections globally. To accelerate the finding of effective treatment options, we introduce HBCVTr, a novel ligand-based drug design (LBDD) method for predicting the inhibitory activity of small molecules against HBV and HCV. HBCVTr employs a hybrid model consisting of double encoders of transformers and a deep neural network to learn the relationship between small molecules' simplified molecular-input line-entry system (SMILES) and their antiviral activity against HBV or HCV. The prediction accuracy of HBCVTr has surpassed baseline machine learning models and existing methods, with R-squared values of 0.641 and 0.721 for the HBV and HCV test sets, respectively. The trained models were successfully applied to virtual screening against 10 million compounds within 240 h, leading to the discovery of the top novel inhibitor candidates, including IJN04 for HBV and IJN12 and IJN19 for HCV. Molecular docking and dynamics simulations identified IJN04, IJN12, and IJN19 target proteins as the HBV core antigen, HCV NS5B RNA-dependent RNA polymerase, and HCV NS3/4A serine protease, respectively. Overall, HBCVTr offers a new and rapid drug discovery and development screening method targeting HBV and HCV.
Collapse
Affiliation(s)
- Ittipat Meewan
- Center for Advanced Therapeutics, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom, 73170, Thailand.
| | - Jiraporn Panmanee
- Research Center for Neuroscience, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom, 73170, Thailand
| | - Nopphon Petchyam
- Center for Advanced Therapeutics, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom, 73170, Thailand
| | - Pichaya Lertvilai
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, 92037, USA
| |
Collapse
|
7
|
van Heerden A, Turon G, Duran-Frigola M, Pillay N, Birkholtz LM. Machine Learning Approaches Identify Chemical Features for Stage-Specific Antimalarial Compounds. ACS OMEGA 2023; 8:43813-43826. [PMID: 38027377 PMCID: PMC10666252 DOI: 10.1021/acsomega.3c05664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/18/2023] [Accepted: 10/20/2023] [Indexed: 12/01/2023]
Abstract
Efficacy data from diverse chemical libraries, screened against the various stages of the malaria parasite Plasmodium falciparum, including asexual blood stage (ABS) parasites and transmissible gametocytes, serve as a valuable reservoir of information on the chemical space of compounds that are either active (or not) against the parasite. We postulated that this data can be mined to define chemical features associated with the sole ABS activity and/or those that provide additional life cycle activity profiles like gametocytocidal activity. Additionally, this information could provide chemical features associated with inactive compounds, which could eliminate any future unnecessary screening of similar chemical analogs. Therefore, we aimed to use machine learning to identify the chemical space associated with stage-specific antimalarial activity. We collected data from various chemical libraries that were screened against the asexual (126 374 compounds) and sexual (gametocyte) stages of the parasite (93 941 compounds), calculated the compounds' molecular fingerprints, and trained machine learning models to recognize stage-specific active and inactive compounds. We were able to build several models that predict compound activity against ABS and dual activity against ABS and gametocytes, with Support Vector Machines (SVM) showing superior abilities with high recall (90 and 66%) and low false-positive predictions (15 and 1%). This allowed the identification of chemical features enriched in active and inactive populations, an important outcome that could be mined for essential chemical features to streamline hit-to-lead optimization strategies of antimalarial candidates. The predictive capabilities of the models held true in diverse chemical spaces, indicating that the ML models are therefore robust and can serve as a prioritization tool to drive and guide phenotypic screening and medicinal chemistry programs.
Collapse
Affiliation(s)
- Ashleigh van Heerden
- Department
of Biochemistry, Genetics and Microbiology, Institute for Sustainable
Malaria Control, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Gemma Turon
- Ersilia
Open Source Initiative, 28 Belgrave Road, Cambridge CB1 3DE, U.K.
| | | | - Nelishia Pillay
- Department
of Computer Science, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Lyn-Marié Birkholtz
- Department
of Biochemistry, Genetics and Microbiology, Institute for Sustainable
Malaria Control, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| |
Collapse
|
8
|
Han M, Jin B, Liang J, Huang C, Arp HPH. Developing machine learning approaches to identify candidate persistent, mobile and toxic (PMT) and very persistent and very mobile (vPvM) substances based on molecular structure. WATER RESEARCH 2023; 244:120470. [PMID: 37595327 DOI: 10.1016/j.watres.2023.120470] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 08/20/2023]
Abstract
Determining which substances on the global market could be classified as persistent, mobile and toxic (PMT) substances or very persistent, very mobile (vPvM) substances is essential to prevent or reduce drinking water contamination from them. This study developed machine learning models based on different molecular descriptors (MDs) and defined applicability domains for the screening of PMT/vPvM substances. The models were trained with 3111 substances with expert weight-of-evidence based PMT/vPvM hazard classifications that considered the highest quality data available. The model was based on the hypothesis that PMT/vPvM substances contain similar MDs, representative of chemical structures resistant to degradation, be associated with low sorption (or high-water solubility) and in some cases be associated with known toxic mechanisms. All possible model combinations were tested by integrating different molecular description methods, data balancing strategies and machine learning algorithms. Our model allows one-step prediction of candidate PMT/vPvM substances, and our method was compared with the approach predicting P, M and T separately (i.e. three-step prediction). The results showed that the one-step model achieved a higher accuracy of 92% for PMT/vPvM identification (i.e. positive samples) for an internal test set, and also resulted in a higher accuracy of 90% for an external test set of chemical pollutants detected in Taihu Lake, China. Furthermore, prediction mechanism of the model was interpreted by Shapley additive explanations (SHAP). This work presents an advance of big data in silico screening models for the identification of substances that potentially meet the PMT/vPvM criteria.
Collapse
Affiliation(s)
- Min Han
- State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China
| | - Biao Jin
- State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China.
| | - Jun Liang
- School of Software, South China Normal University, Foshan, 528225, China
| | - Chen Huang
- State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China
| | - Hans Peter H Arp
- Norwegian Geotechnical Institute (NGI), P.O. Box 3930 Ullevaal Stadion, Oslo, N-0806, Norway; Norwegian University of Science and Technology (NTNU), Trondheim, NO-7491, Norway
| |
Collapse
|
9
|
Chen YH, Chao KH, Wong JY, Liu CF, Leu JY, Tsai HK. A feature extraction free approach for protein interactome inference from co-elution data. Brief Bioinform 2023; 24:bbad229. [PMID: 37328692 DOI: 10.1093/bib/bbad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 05/01/2023] [Accepted: 05/29/2023] [Indexed: 06/18/2023] Open
Abstract
Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance. Several computational methods have been designed to analyze CF-MS data and construct probabilistic protein-protein interaction (PPI) networks. Current methods usually first infer PPIs based on handcrafted CF-MS features, and then use clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features and severely imbalanced data distribution. However, the handcrafted features based on domain knowledge might introduce bias, and current methods also tend to overfit due to the severely imbalanced PPI data. To address these issues, we present a balanced end-to-end learning architecture, Software for Prediction of Interactome with Feature-extraction Free Elution Data (SPIFFED), to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network. SPIFFED outperforms the state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED had greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from multiple CF-MS data. Using the clustering software (i.e. ClusterONE), SPIFFED allows users to infer high-confidence protein complexes depending on the CF-MS experimental designs. The source code of SPIFFED is freely available at: https://github.com/bio-it-station/SPIFFED.
Collapse
Affiliation(s)
- Yu-Hsin Chen
- Bioinformatics Program, Taiwan International Graduate Program, National Taiwan University, Taipei 106, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Academic Sinica, Taipei 11529, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Kuan-Hao Chao
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Jin Yung Wong
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Chien-Fu Liu
- Institute of Molecular Biology, Academia Sinica, Taipei, 11529, Taiwan
| | - Jun-Yi Leu
- Institute of Molecular Biology, Academia Sinica, Taipei, 11529, Taiwan
| | - Huai-Kuang Tsai
- Bioinformatics Program, Taiwan International Graduate Program, National Taiwan University, Taipei 106, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Academic Sinica, Taipei 11529, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| |
Collapse
|
10
|
Xu M, Lu Z, Wu Z, Gui M, Liu G, Tang Y, Li W. Development of In Silico Models for Predicting Potential Time-Dependent Inhibitors of Cytochrome P450 3A4. Mol Pharm 2023; 20:194-205. [PMID: 36458739 DOI: 10.1021/acs.molpharmaceut.2c00571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Cytochrome P450 3A4 (CYP3A4) is one of the major drug metabolizing enzymes in the human body and metabolizes ∼30-50% of clinically used drugs. Inhibition of CYP3A4 must always be considered in the development of new drugs. Time-dependent inhibition (TDI) is an important P450 inhibition type that could cause undesired drug-drug interactions. Therefore, identification of CYP3A4 TDI by a rapid convenient way is of great importance to any new drug discovery effort. Here, we report the development of in silico classification models for prediction of potential CYP3A4 time-dependent inhibitors. On the basis of the CYP3A4 TDI data set that we manually collected from literature and databases, both conventional machine learning and deep learning models were constructed. The comparisons of different sampling strategies, molecular representations, and machine-learning algorithms showed the benefits of a balanced data set and the deep-learning model featured by GraphConv. The generalization ability of the best model was tested by screening an external data set, and the prediction results were validated by biological experiments. In addition, several structural alerts that are relevant to CYP3A4 time-dependent inhibitors were identified via information gain and frequency analysis. We anticipate that our effort would be useful for identification of potential CYP3A4 time-dependent inhibitors in drug discovery and design.
Collapse
Affiliation(s)
- Minjie Xu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Zhou Lu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Zengrui Wu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Minyan Gui
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| |
Collapse
|
11
|
Norinder U. Traditional Machine and Deep Learning for Predicting Toxicity Endpoints. Molecules 2022; 28:217. [PMID: 36615411 PMCID: PMC9822478 DOI: 10.3390/molecules28010217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 12/16/2022] [Accepted: 12/21/2022] [Indexed: 12/28/2022] Open
Abstract
Molecular structure property modeling is an increasingly important tool for predicting compounds with desired properties due to the expensive and resource-intensive nature and the problem of toxicity-related attrition in late phases during drug discovery and development. Lately, the interest for applying deep learning techniques has increased considerably. This investigation compares the traditional physico-chemical descriptor and machine learning-based approaches through autoencoder generated descriptors to two different descriptor-free, Simplified Molecular Input Line Entry System (SMILES) based, deep learning architectures of Bidirectional Encoder Representations from Transformers (BERT) type using the Mondrian aggregated conformal prediction method as overarching framework. The results show for the binary CATMoS non-toxic and very-toxic datasets that for the former, almost equally balanced, dataset all methods perform equally well while for the latter dataset, with an 11-fold difference between the two classes, the MolBERT model based on a large pre-trained network performs somewhat better compared to the rest with high efficiency for both classes (0.93-0.94) as well as high values for sensitivity, specificity and balanced accuracy (0.86-0.87). The descriptor-free, SMILES-based, deep learning BERT architectures seem capable of producing well-balanced predictive models with defined applicability domains. This work also demonstrates that the class imbalance problem is gracefully handled through the use of Mondrian conformal prediction without the use of over- and/or under-sampling, weighting of classes or cost-sensitive methods.
Collapse
Affiliation(s)
- Ulf Norinder
- Department of Computer and Systems Sciences, Stockholm University, 164 07 Kista, Sweden
| |
Collapse
|
12
|
Sun J, Wen M, Wang H, Ruan Y, Yang Q, Kang X, Zhang H, Zhang Z, Lu H. Prediction of drug-likeness using graph convolutional attention network. Bioinformatics 2022; 38:5262-5269. [PMID: 36222555 DOI: 10.1093/bioinformatics/btac676] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/22/2022] [Accepted: 10/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The drug-likeness has been widely used as a criterion to distinguish drug-like molecules from non-drugs. Developing reliable computational methods to predict the drug-likeness of compounds is crucial to triage unpromising molecules and accelerate the drug discovery process. RESULTS In this study, a deep learning method was developed to predict the drug-likeness based on the graph convolutional attention network (D-GCAN) directly from molecular structures. Results showed that the D-GCAN model outperformed other state-of-the-art models for drug-likeness prediction. The combination of graph convolution and attention mechanism made an important contribution to the performance of the model. Specifically, the application of the attention mechanism improved accuracy by 4.0%. The utilization of graph convolution improved the accuracy by 6.1%. Results on the dataset beyond Lipinski's rule of five space and the non-US dataset showed that the model had good versatility. Then, the billion-scale GDB-13 database was used as a case study to screen SARS-CoV-2 3C-like protease inhibitors. Sixty-five drug candidates were screened out, most substructures of which are similar to these of existing oral drugs. Candidates screened from S-GDB13 have higher similarity to existing drugs and better molecular docking performance than those from the rest of GDB-13. The screening speed on S-GDB13 is significantly faster than screening directly on GDB-13. In general, D-GCAN is a promising tool to predict the drug-likeness for selecting potential candidates and accelerating drug discovery by excluding unpromising candidates and avoiding unnecessary biological and clinical testing. AVAILABILITY AND IMPLEMENTATION The source code, model and tutorials are available at https://github.com/JinYSun/D-GCAN. The S-GDB13 database is available at https://doi.org/10.5281/zenodo.7054367. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jinyu Sun
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Huabei Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yuezhe Ruan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Xiao Kang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
13
|
Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022; 56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug-target interactions (DTIs), drug-drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.
Collapse
Affiliation(s)
- Heba Askr
- Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt
| | - Enas Elgeldawi
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Heba Aboul Ella
- Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt
| | | | - Mamdouh M. Gomaa
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| |
Collapse
|
14
|
Boldini D, Friedrich L, Kuhn D, Sieber SA. Tuning gradient boosting for imbalanced bioassay modelling with custom loss functions. J Cheminform 2022; 14:80. [DOI: 10.1186/s13321-022-00657-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/30/2022] [Indexed: 11/12/2022] Open
Abstract
AbstractWhile in the last years there has been a dramatic increase in the number of available bioassay datasets, many of them suffer from extremely imbalanced distribution between active and inactive compounds. Thus, there is an urgent need for novel approaches to tackle class imbalance in drug discovery. Inspired by recent advances in computer vision, we investigated a panel of alternative loss functions for imbalanced classification in the context of Gradient Boosting and benchmarked them on six datasets from public and proprietary sources, for a total of 42 tasks and 2 million compounds. Our findings show that with these modifications, we achieve statistically significant improvements over the conventional cross-entropy loss function on five out of six datasets. Furthermore, by employing these bespoke loss functions we are able to push Gradient Boosting to match or outperform a wide variety of previously reported classifiers and neural networks. We also investigate the impact of changing the loss function on training time and find that it increases convergence speed up to 8 times faster. As such, these results show that tuning the loss function for Gradient Boosting is a straightforward and computationally efficient method to achieve state-of-the-art performance on imbalanced bioassay datasets without compromising on interpretability and scalability.
Graphical Abstract
Collapse
|
15
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2023 update. Nucleic Acids Res 2022; 51:D1373-D1380. [PMID: 36305812 PMCID: PMC9825602 DOI: 10.1093/nar/gkac956] [Citation(s) in RCA: 775] [Impact Index Per Article: 387.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/06/2022] [Accepted: 10/13/2022] [Indexed: 01/30/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- To whom correspondence should be addressed. Tel: +1 301 451 1811; Fax: +1 301 480 4559;
| |
Collapse
|
16
|
Divyanth LG, Marzougui A, González-Bernal MJ, McGee RJ, Rubiales D, Sankaran S. Evaluation of Effective Class-Balancing Techniques for CNN-Based Assessment of Aphanomyces Root Rot Resistance in Pea ( Pisum sativum L.). SENSORS (BASEL, SWITZERLAND) 2022; 22:7237. [PMID: 36236336 PMCID: PMC9572822 DOI: 10.3390/s22197237] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 09/15/2022] [Accepted: 09/16/2022] [Indexed: 06/16/2023]
Abstract
Aphanomyces root rot (ARR) is a devastating disease that affects the production of pea. The plants are prone to infection at any growth stage, and there are no chemical or cultural controls. Thus, the development of resistant pea cultivars is important. Phenomics technologies to support the selection of resistant cultivars through phenotyping can be valuable. One such approach is to couple imaging technologies with deep learning algorithms that are considered efficient for the assessment of disease resistance across a large number of plant genotypes. In this study, the resistance to ARR was evaluated through a CNN-based assessment of pea root images. The proposed model, DeepARRNet, was designed to classify the pea root images into three classes based on ARR severity scores, namely, resistant, intermediate, and susceptible classes. The dataset consisted of 1581 pea root images with a skewed distribution. Hence, three effective data-balancing techniques were identified to solve the prevalent problem of unbalanced datasets. Random oversampling with image transformations, generative adversarial network (GAN)-based image synthesis, and loss function with class-weighted ratio were implemented during the training process. The result indicated that the classification F1-score was 0.92 ± 0.03 when GAN-synthesized images were added, 0.91 ± 0.04 for random resampling, and 0.88 ± 0.05 when class-weighted loss function was implemented, which was higher than when an unbalanced dataset without these techniques were used (0.83 ± 0.03). The systematic approaches evaluated in this study can be applied to other image-based phenotyping datasets, which can aid the development of deep-learning models with improved performance.
Collapse
Affiliation(s)
- L. G. Divyanth
- Department of Biological Systems Engineering, Washington State University, Pullman, WA 99164, USA
- Department of Agricultural and Food Engineering, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Afef Marzougui
- Department of Biological Systems Engineering, Washington State University, Pullman, WA 99164, USA
| | | | - Rebecca J. McGee
- Grain Legume Genetics and Physiology Research Unit, US Department of Agriculture-Agricultural Research Service (USDA-ARS), Pullman, WA 99164, USA
| | - Diego Rubiales
- The Institute for Sustainable Agriculture, Spanish National Research Council, 14001 Cordova, Spain
| | - Sindhuja Sankaran
- Department of Biological Systems Engineering, Washington State University, Pullman, WA 99164, USA
| |
Collapse
|
17
|
Chen TL, Chen JC, Chang WH, Tsai W, Shih MC, Wildan Nabila A. Imbalanced prediction of emergency department admission using natural language processing and deep neural network. J Biomed Inform 2022; 133:104171. [PMID: 35995106 DOI: 10.1016/j.jbi.2022.104171] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 07/14/2022] [Accepted: 08/13/2022] [Indexed: 11/26/2022]
Abstract
The emergency department (ED) plays a very significant role in the hospital. Owing to the rising number of ED visits, medical service points, and ED market, overcrowding of EDs has become serious worldwide. Overcrowding has long been recognized as a vital issue that increases the risk to patients and negative emotions of medical personnel and impacts hospital cost management. For the past years, many researchers have been applying artificial intelligence to reduce crowding situations in the ED. Nevertheless, the datasets in ED hospital admission are naturally inherent with the high-class imbalance in the real world. Previous studies have not considered the imbalance of the datasets, particularly addressing the imbalance. This study purposes to develop a natural language processing model of a deep neural network with an attention mechanism to solve the imbalanced problem in ED admission. The proposed framework is used for predicting hospital admission so that the hospitals can arrange beds early and solve the problem of congestion in the ED. Furthermore, the study compares a variety of methods and obtains the best composition that has the best performance for forecasting hospitalization in ED. The study used the data from a specific hospital in Taiwan as an empirical study. The experimental result demonstrates that almost all imbalanced methods can improve the model's performance. In addition, the natural language processing model of Bi-directional Long Short-Term Memory with attention mechanism has the best results in all-natural language processing methods.
Collapse
Affiliation(s)
- Tzu-Li Chen
- Department of Industrial Engineering and Management, National Taipei University of Technology, Taiwan.
| | - James C Chen
- Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Taiwan
| | - Wen-Han Chang
- Department of Emergency Medicine, Mackay Memorial Hospital, Taiwan
| | - Weide Tsai
- Department of Emergency Medicine, Mackay Memorial Hospital, Taiwan
| | - Mei-Chuan Shih
- Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Taiwan
| | - Achmad Wildan Nabila
- Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Taiwan
| |
Collapse
|
18
|
Alsaui AA, Alghofaili YA, Alghadeer M, Alharbi FH. Resampling Techniques for Materials Informatics: Limitations in Crystal Point Groups Classification. J Chem Inf Model 2022; 62:3514-3523. [DOI: 10.1021/acs.jcim.2c00666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Abdulmohsen A. Alsaui
- Electrical Engineering Department, Indian Institute of Technology Madras, Chennai 600036, India
| | - Yousef A. Alghofaili
- Research and Development Department, Xpedite Information Technology, Riyadh 13333, Saudi Arabia
| | - Mohammed Alghadeer
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Fahhad H. Alharbi
- Electrical Engineering Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
- SDAIA-KFUPM Joint Research Center for Artificial Intelligence, Dhahran 31261, Saudi Arabia
| |
Collapse
|
19
|
Harigua-Souiai E, Oualha R, Souiai O, Abdeljaoued-Tej I, Guizani I. Applied Machine Learning Toward Drug Discovery Enhancement: Leishmaniases as a Case Study. Bioinform Biol Insights 2022; 16:11779322221090349. [PMID: 35478992 PMCID: PMC9036323 DOI: 10.1177/11779322221090349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 03/04/2022] [Indexed: 11/25/2022] Open
Abstract
Drug discovery (DD) research is a complex field with a high attrition rate. Machine learning (ML) approaches combined to chemoinformatics are of valuable input to this field. We, herein, focused on implementing multiple ML algorithms that shall learn from different molecular fingerprints (FPs) of 65 057 molecules that have been identified as active or inactive against Leishmania major promastigotes. We sought to build a classifier able to predict whether a given molecule has the potential of being anti-leishmanial or not. Using the RDkit library, we calculated 5 molecular FPs of the molecules. Then, we implemented 4 ML algorithms that we trained and tested for their ability to classify the molecules into active/inactive classes based on their chemical structure, encoded by the molecular FPs. Best performers were random forest (RF) and support vector machine (SVM), while atom-pair and topology torsion FPs were the best embedding functions. Both models were further assessed on different stratification levels of the dataset and showed stable performances. At last, we used them to predict the potential of molecules within the Food and Drug Administration (FDA)-approved drugs collection to present anti-Leishmania effects. We ranked these drugs according to their anti-Leishmanial probability and obtained in total seven anti-Leishmania agents, previously described in the literature, within the top 10 of each model. This validates the robustness of the approach, the algorithms, and FPs choices as well as the importance of the dataset size and content. We further engaged these molecules into reverse docking experiments on 3D crystal structures of seven well-studied Leishmania drug targets and could predict the molecular targets for 4 drugs. The results bring novel insights into anti-Leishmania compounds.
Collapse
Affiliation(s)
- Emna Harigua-Souiai
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Rafeh Oualha
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Oussama Souiai
- Laboratory of Bioinformatics, BioMathematics and BioStatistics LR20IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Ines Abdeljaoued-Tej
- Laboratory of Bioinformatics, BioMathematics and BioStatistics LR20IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia.,Engineering School of Statistics and Information Analysis, University of Carthage, Ariana, Tunisia
| | - Ikram Guizani
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| |
Collapse
|
20
|
Drug repurposing in silico screening platforms. Biochem Soc Trans 2022; 50:747-758. [PMID: 35285479 DOI: 10.1042/bst20200967] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/08/2022] [Accepted: 02/21/2022] [Indexed: 12/15/2022]
Abstract
Over the last decade, for the first time, substantial efforts have been directed at the development of dedicated in silico platforms for drug repurposing, including initiatives targeting cancers and conditions as diverse as cryptosporidiosis, dengue, dental caries, diabetes, herpes, lupus, malaria, tuberculosis and Covid-19 related respiratory disease. This review outlines some of the exciting advances in the specific applications of in silico approaches to the challenge of drug repurposing and focuses particularly on where these efforts have resulted in the development of generic platform technologies of broad value to researchers involved in programmatic drug repurposing work. Recent advances in molecular docking methodologies and validation approaches, and their combination with machine learning or deep learning approaches are continually enhancing the precision of repurposing efforts. The meaningful integration of better understanding of molecular mechanisms with molecular pathway data and knowledge of disease networks is widening the scope for discovery of repurposing opportunities. The power of Artificial Intelligence is being gainfully exploited to advance progress in an integrated science that extends from the sub-atomic to the whole system level. There are many promising emerging developments but there are remaining challenges to be overcome in the successful integration of the new advances in useful platforms. In conclusion, the essential component requirements for development of powerful and well optimised drug repurposing screening platforms are discussed.
Collapse
|
21
|
Accurate predictions of drugs aqueous solubility via deep learning tools. J Mol Struct 2022. [DOI: 10.1016/j.molstruc.2021.131562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
22
|
Li B, Rangarajan S. A conceptual study of transfer learning with linear models for data-driven property prediction. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2021.107599] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
23
|
Mayabadi S, Saadatfar H. Two density-based sampling approaches for imbalanced and overlapping data. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108217] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
24
|
Harigua-Souiai E, Heinhane MM, Abdelkrim YZ, Souiai O, Abdeljaoued-Tej I, Guizani I. Deep Learning Algorithms Achieved Satisfactory Predictions When Trained on a Novel Collection of Anticoronavirus Molecules. Front Genet 2021; 12:744170. [PMID: 34912370 PMCID: PMC8667578 DOI: 10.3389/fgene.2021.744170] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 09/30/2021] [Indexed: 12/26/2022] Open
Abstract
Drug discovery and repurposing against COVID-19 is a highly relevant topic with huge efforts dedicated to delivering novel therapeutics targeting SARS-CoV-2. In this context, computer-aided drug discovery is of interest in orienting the early high throughput screenings and in optimizing the hit identification rate. We herein propose a pipeline for Ligand-Based Drug Discovery (LBDD) against SARS-CoV-2. Through an extensive search of the literature and multiple steps of filtering, we integrated information on 2,610 molecules having a validated effect against SARS-CoV and/or SARS-CoV-2. The chemical structures of these molecules were encoded through multiple systems to be readily useful as input to conventional machine learning (ML) algorithms or deep learning (DL) architectures. We assessed the performances of seven ML algorithms and four DL algorithms in achieving molecule classification into two classes: active and inactive. The Random Forests (RF), Graph Convolutional Network (GCN), and Directed Acyclic Graph (DAG) models achieved the best performances. These models were further optimized through hyperparameter tuning and achieved ROC-AUC scores through cross-validation of 85, 83, and 79% for RF, GCN, and DAG models, respectively. An external validation step on the FDA-approved drugs collection revealed a superior potential of DL algorithms to achieve drug repurposing against SARS-CoV-2 based on the dataset herein presented. Namely, GCN and DAG achieved more than 50% of the true positive rate assessed on the confirmed hits of a PubChem bioassay.
Collapse
Affiliation(s)
- Emna Harigua-Souiai
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Mohamed Mahmoud Heinhane
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Yosser Zina Abdelkrim
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Oussama Souiai
- Laboratory of BioInformatics BioMathematics and BioStatistics (BIMS)-LR20IPT09, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - Ines Abdeljaoued-Tej
- Laboratory of BioInformatics BioMathematics and BioStatistics (BIMS)-LR20IPT09, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Engineering School of Statistics and Information Analysis, University of Carthage, Ariana, Tunisia
| | - Ikram Guizani
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| |
Collapse
|
25
|
Zhao Q, Ma J, Wang Y, Xie F, Lv Z, Xu Y, Shi H, Han K. Mul-SNO: A novel prediction tool for S-nitrosylation sites based on deep learning methods. IEEE J Biomed Health Inform 2021; 26:2379-2387. [PMID: 34762593 DOI: 10.1109/jbhi.2021.3123503] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein s-nitrosylation (SNO is one of the most important post-translational modifications and is formed by the covalent modification of nitric oxide and cysteine residues. Extensive studies have shown that SNO plays a pivotal role in the plant immune response and treating various major human diseases. In recent years, SNO sites have become a hot research topic. Traditional biochemical methods for SNO site identification are time-consuming and costly. In this study, we developed an economical and efficient SNO site prediction tool named Mul-SNO. Mul-SNO ensembled current popular and powerful deep learning model bidirectional long short-term memory (BiLSTM and bidirectional encoder representations from Transformers (BERT . Compared with existing state-of-the-art methods, Mul-SNO obtained better ACC of 0.911 and 0.796 based on 10-fold cross-validation and independent data sets, respectively. The prediction server can be obtained for free at http://lab.malab.cn/~mjq/Mul-SNO/.
Collapse
|
26
|
Hu F, Wang L, Hu Y, Wang D, Wang W, Jiang J, Li N, Yin P. A novel framework integrating AI model and enzymological experiments promotes identification of SARS-CoV-2 3CL protease inhibitors and activity-based probe. Brief Bioinform 2021; 22:bbab301. [PMID: 34368837 PMCID: PMC8385923 DOI: 10.1093/bib/bbab301] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/11/2021] [Accepted: 07/15/2021] [Indexed: 01/03/2023] Open
Abstract
The identification of protein-ligand interaction plays a key role in biochemical research and drug discovery. Although deep learning has recently shown great promise in discovering new drugs, there remains a gap between deep learning-based and experimental approaches. Here, we propose a novel framework, named AIMEE, integrating AI model and enzymological experiments, to identify inhibitors against 3CL protease of SARS-CoV-2 (Severe acute respiratory syndrome coronavirus 2), which has taken a significant toll on people across the globe. From a bioactive chemical library, we have conducted two rounds of experiments and identified six novel inhibitors with a hit rate of 29.41%, and four of them showed an IC50 value <3 μM. Moreover, we explored the interpretability of the central model in AIMEE, mapping the deep learning extracted features to the domain knowledge of chemical properties. Based on this knowledge, a commercially available compound was selected and was proven to be an activity-based probe of 3CLpro. This work highlights the great potential of combining deep learning models and biochemical experiments for intelligent iteration and for expanding the boundaries of drug discovery. The code and data are available at https://github.com/SIAT-code/AIMEE.
Collapse
Affiliation(s)
- Fan Hu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Lei Wang
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yishen Hu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Dongqi Wang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Weijie Wang
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jianbing Jiang
- School of Pharmaceutical Sciences, Shenzhen University Health Science Center, Shenzhen, 518055, China
| | - Nan Li
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Peng Yin
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| |
Collapse
|
27
|
Deng J, Yang Z, Ojima I, Samaras D, Wang F. Artificial intelligence in drug discovery: applications and techniques. Brief Bioinform 2021; 23:6420092. [PMID: 34734228 DOI: 10.1093/bib/bbab430] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 08/02/2021] [Accepted: 09/18/2021] [Indexed: 12/23/2022] Open
Abstract
Artificial intelligence (AI) has been transforming the practice of drug discovery in the past decade. Various AI techniques have been used in many drug discovery applications, such as virtual screening and drug design. In this survey, we first give an overview on drug discovery and discuss related applications, which can be reduced to two major tasks, i.e. molecular property prediction and molecule generation. We then present common data resources, molecule representations and benchmark platforms. As a major part of the survey, AI techniques are dissected into model architectures and learning paradigms. To reflect the technical development of AI in drug discovery over the years, the surveyed works are organized chronologically. We expect that this survey provides a comprehensive review on AI in drug discovery. We also provide a GitHub repository with a collection of papers (and codes, if applicable) as a learning resource, which is regularly updated.
Collapse
Affiliation(s)
- Jianyuan Deng
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY 11790, USA
| | - Zhibo Yang
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA
| | - Iwao Ojima
- Department of Chemistry, Stony Brook University, Stony Brook, NY 11790, USA
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA
| | - Fusheng Wang
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY 11790, USA.,Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA
| |
Collapse
|
28
|
Wei YP, Yao LY, Wu YY, Liu X, Peng LH, Tian YL, Ding JH, Li KH, He QG. Critical Review of Synthesis, Toxicology and Detection of Acyclovir. Molecules 2021; 26:molecules26216566. [PMID: 34770975 PMCID: PMC8587948 DOI: 10.3390/molecules26216566] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 10/25/2021] [Accepted: 10/27/2021] [Indexed: 02/02/2023] Open
Abstract
Acyclovir (ACV) is an effective and selective antiviral drug, and the study of its toxicology and the use of appropriate detection techniques to control its toxicity at safe levels are extremely important for medicine efforts and human health. This review discusses the mechanism driving ACV’s ability to inhibit viral coding, starting from its development and pharmacology. A comprehensive summary of the existing preparation methods and synthetic materials, such as 5-aminoimidazole-4-carboxamide, guanine and its derivatives, and other purine derivatives, is presented to elucidate the preparation of ACV in detail. In addition, it presents valuable analytical procedures for the toxicological studies of ACV, which are essential for human use and dosing. Analytical methods, including spectrophotometry, high performance liquid chromatography (HPLC), liquid chromatography/tandem mass spectrometry (LC-MS/MS), electrochemical sensors, molecularly imprinted polymers (MIPs), and flow injection–chemiluminescence (FI-CL) are also highlighted. A brief description of the characteristics of each of these methods is also presented. Finally, insight is provided for the development of ACV to drive further innovation of ACV in pharmaceutical applications. This review provides a comprehensive summary of the past life and future challenges of ACV.
Collapse
Affiliation(s)
- Yan-Ping Wei
- School of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, China; (Y.-P.W.); (Y.-Y.W.); (L.-H.P.); (Y.-L.T.)
- Zhuzhou People’s Hospital, Zhuzhou 412001, China; (X.L.); (J.-H.D.)
- Hunan Qianjin Xiangjiang Pharmaceutical Joint Stock Co., Ltd., Zhuzhou 412001, China;
| | - Liang-Yuan Yao
- Hunan Qianjin Xiangjiang Pharmaceutical Joint Stock Co., Ltd., Zhuzhou 412001, China;
| | - Yi-Yong Wu
- School of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, China; (Y.-P.W.); (Y.-Y.W.); (L.-H.P.); (Y.-L.T.)
| | - Xia Liu
- Zhuzhou People’s Hospital, Zhuzhou 412001, China; (X.L.); (J.-H.D.)
| | - Li-Hong Peng
- School of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, China; (Y.-P.W.); (Y.-Y.W.); (L.-H.P.); (Y.-L.T.)
| | - Ya-Ling Tian
- School of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, China; (Y.-P.W.); (Y.-Y.W.); (L.-H.P.); (Y.-L.T.)
| | - Jian-Hua Ding
- Zhuzhou People’s Hospital, Zhuzhou 412001, China; (X.L.); (J.-H.D.)
| | - Kang-Hua Li
- Zhuzhou People’s Hospital, Zhuzhou 412001, China; (X.L.); (J.-H.D.)
- Correspondence: (K.-H.L.); (Q.-G.H.); Tel./Fax: +86-731-2218-3426 (Q.-G.H.)
| | - Quan-Guo He
- School of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, China; (Y.-P.W.); (Y.-Y.W.); (L.-H.P.); (Y.-L.T.)
- Zhuzhou People’s Hospital, Zhuzhou 412001, China; (X.L.); (J.-H.D.)
- Hunan Qianjin Xiangjiang Pharmaceutical Joint Stock Co., Ltd., Zhuzhou 412001, China;
- Correspondence: (K.-H.L.); (Q.-G.H.); Tel./Fax: +86-731-2218-3426 (Q.-G.H.)
| |
Collapse
|
29
|
Kim J, Park S, Min D, Kim W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int J Mol Sci 2021; 22:9983. [PMID: 34576146 PMCID: PMC8470987 DOI: 10.3390/ijms22189983] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/09/2021] [Accepted: 09/10/2021] [Indexed: 02/07/2023] Open
Abstract
Drug discovery based on artificial intelligence has been in the spotlight recently as it significantly reduces the time and cost required for developing novel drugs. With the advancement of deep learning (DL) technology and the growth of drug-related data, numerous deep-learning-based methodologies are emerging at all steps of drug development processes. In particular, pharmaceutical chemists have faced significant issues with regard to selecting and designing potential drugs for a target of interest to enter preclinical testing. The two major challenges are prediction of interactions between drugs and druggable targets and generation of novel molecular structures suitable for a target of interest. Therefore, we reviewed recent deep-learning applications in drug-target interaction (DTI) prediction and de novo drug design. In addition, we introduce a comprehensive summary of a variety of drug and protein representations, DL models, and commonly used benchmark datasets or tools for model training and testing. Finally, we present the remaining challenges for the promising future of DL-based DTI prediction and de novo drug design.
Collapse
Affiliation(s)
- Jintae Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Sera Park
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Dongbo Min
- Computer Vision Lab, Department of Computer Science and Engineering, Ewha Womans University, Seoul 03760, Korea
| | - Wankyu Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
- System Pharmacology Lab, Department of Life Sciences, Ewha Womans University, Seoul 03760, Korea
| |
Collapse
|
30
|
Deep Learning Approach for Discovery of In Silico Drugs for Combating COVID-19. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:6668985. [PMID: 34326978 PMCID: PMC8302400 DOI: 10.1155/2021/6668985] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 07/08/2021] [Indexed: 12/26/2022]
Abstract
Early diagnosis of pandemic diseases such as COVID-19 can prove beneficial in dealing with difficult situations and helping radiologists and other experts manage staffing more effectively. The application of deep learning techniques for genetics, microscopy, and drug discovery has created a global impact. It can enhance and speed up the process of medical research and development of vaccines, which is required for pandemics such as COVID-19. However, current drugs such as remdesivir and clinical trials of other chemical compounds have not shown many impressive results. Therefore, it can take more time to provide effective treatment or drugs. In this paper, a deep learning approach based on logistic regression, SVM, Random Forest, and QSAR modeling is suggested. QSAR modeling is done to find the drug targets with protein interaction along with the calculation of binding affinities. Then deep learning models were used for training the molecular descriptor dataset for the robust discovery of drugs and feature extraction for combating COVID-19. Results have shown more significant binding affinities (greater than −18) for many molecules that can be used to block the multiplication of SARS-CoV-2, responsible for COVID-19.
Collapse
|
31
|
Hua Y, Shi Y, Cui X, Li X. In silico prediction of chemical-induced hematotoxicity with machine learning and deep learning methods. Mol Divers 2021; 25:1585-1596. [PMID: 34196933 DOI: 10.1007/s11030-021-10255-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 06/14/2021] [Indexed: 12/15/2022]
Abstract
Chemical-induced hematotoxicity is an important concern in the drug discovery, since it can often be fatal when it happens. It is quite useful for us to give special attention to chemicals which can cause hematotoxicity. In the present study, we focused on in silico prediction of chemical-induced hematotoxicity with machine learning (ML) and deep learning (DL) methods. We collected a large data set contained 632 hematotoxic chemicals and 1525 approved drugs without hematotoxicity. Computational models were built using several different machine learning and deep learning algorithms integrated on the Online Chemical Modeling Environment (OCHEM). Based on the three best individual models, a consensus model was developed. It yielded the prediction accuracy of 0.83 and balanced accuracy of 0.77 on external validation. The consensus model and the best individual model developed with random forest regression and classification algorithm (RFR) and QNPR descriptors were made available at https://ochem.eu/article/135149 , respectively. The relevance of 8 commonly used molecular properties and chemical-induced hematotoxicity was also investigated. Several molecular properties have an obvious differentiating effect on chemical-induced hematotoxicity. Besides, 12 structural alerts responsible for chemical hematotoxicity were identified using frequency analysis of substructures from Klekota-Roth fingerprint. These results should provide meaningful knowledge and useful tools for hematotoxicity evaluation in drug discovery and environmental risk assessment.
Collapse
Affiliation(s)
- Yuqing Hua
- School of Pharmacy, Shandong First Medical University, Taian, 271000, China.,Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, 250014, China
| | - Yinping Shi
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, 250014, China
| | - Xueyan Cui
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, 250014, China
| | - Xiao Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, 250014, China. .,Department of Clinical Pharmacy, Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan, 250014, China.
| |
Collapse
|
32
|
Zhao D, Wang X, Mu Y, Wang L. Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy. ENTROPY (BASEL, SWITZERLAND) 2021; 23:822. [PMID: 34203274 PMCID: PMC8307085 DOI: 10.3390/e23070822] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/18/2021] [Accepted: 06/24/2021] [Indexed: 12/12/2022]
Abstract
Imbalance ensemble classification is one of the most essential and practical strategies for improving decision performance in data analysis. There is a growing body of literature about ensemble techniques for imbalance learning in recent years, the various extensions of imbalanced classification methods were established from different points of view. The present study is initiated in an attempt to review the state-of-the-art ensemble classification algorithms for dealing with imbalanced datasets, offering a comprehensive analysis for incorporating the dynamic selection of base classifiers in classification. By conducting 14 existing ensemble algorithms incorporating a dynamic selection on 56 datasets, the experimental results reveal that the classical algorithm with a dynamic selection strategy deliver a practical way to improve the classification performance for both a binary class and multi-class imbalanced datasets. In addition, by combining patch learning with a dynamic selection ensemble classification, a patch-ensemble classification method is designed, which utilizes the misclassified samples to train patch classifiers for increasing the diversity of base classifiers. The experiments' results indicate that the designed method has a certain potential for the performance of multi-class imbalanced classification.
Collapse
Affiliation(s)
- Dongxue Zhao
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Xin Wang
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Yashuang Mu
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China
| | - Lidong Wang
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
33
|
Affiliation(s)
- W Patrick Walters
- Relay Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| |
Collapse
|
34
|
Hou T, Bian Y, McGuire T, Xie XQ. Integrated Multi-Class Classification and Prediction of GPCR Allosteric Modulators by Machine Learning Intelligence. Biomolecules 2021; 11:biom11060870. [PMID: 34208096 PMCID: PMC8230833 DOI: 10.3390/biom11060870] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 05/30/2021] [Accepted: 06/08/2021] [Indexed: 01/01/2023] Open
Abstract
G-protein-coupled receptors (GPCRs) are the largest and most diverse group of cell surface receptors that respond to various extracellular signals. The allosteric modulation of GPCRs has emerged in recent years as a promising approach for developing target-selective therapies. Moreover, the discovery of new GPCR allosteric modulators can greatly benefit the further understanding of GPCR cell signaling mechanisms. It is critical but also challenging to make an accurate distinction of modulators for different GPCR groups in an efficient and effective manner. In this study, we focus on an 11-class classification task with 10 GPCR subtype classes and a random compounds class. We used a dataset containing 34,434 compounds with allosteric modulators collected from classical GPCR families A, B, and C, as well as random drug-like compounds. Six types of machine learning models, including support vector machine, naïve Bayes, decision tree, random forest, logistic regression, and multilayer perceptron, were trained using different combinations of features including molecular descriptors, Atom-pair fingerprints, MACCS fingerprints, and ECFP6 fingerprints. The performances of trained machine learning models with different feature combinations were closely investigated and discussed. To the best of our knowledge, this is the first work on the multi-class classification of GPCR allosteric modulators. We believe that the classification models developed in this study can be used as simple and accurate tools for the discovery and development of GPCR allosteric modulators.
Collapse
Affiliation(s)
- Tianling Hou
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen (CCGS) Center and Pharmacometrics System Pharmacology Program, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (T.H.); (Y.B.); (T.M.)
- NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yuemin Bian
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen (CCGS) Center and Pharmacometrics System Pharmacology Program, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (T.H.); (Y.B.); (T.M.)
- NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Terence McGuire
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen (CCGS) Center and Pharmacometrics System Pharmacology Program, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (T.H.); (Y.B.); (T.M.)
- NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen (CCGS) Center and Pharmacometrics System Pharmacology Program, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (T.H.); (Y.B.); (T.M.)
- Drug Discovery Institute, Departments of Computational Biology and of Structural Biology, University of Pittsburgh, Pittsburgh, PA 15261, USA
- Correspondence:
| |
Collapse
|
35
|
Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J Chem Inf Model 2021; 61:2623-2640. [PMID: 34100609 DOI: 10.1021/acs.jcim.1c00160] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning classifiers trained on class imbalanced data are prone to overpredict the majority class. This leads to a larger misclassification rate for the minority class, which in many real-world applications is the class of interest. For binary data, the classification threshold is set by default to 0.5 which, however, is often not ideal for imbalanced data. Adjusting the decision threshold is a good strategy to deal with the class imbalance problem. In this work, we present two different automated procedures for the selection of the optimal decision threshold for imbalanced classification. A major advantage of our procedures is that they do not require retraining of the machine learning models or resampling of the training data. The first approach is specific for random forest (RF), while the second approach, named GHOST, can be potentially applied to any machine learning classifier. We tested these procedures on 138 public drug discovery data sets containing structure-activity data for a variety of pharmaceutical targets. We show that both thresholding methods improve significantly the performance of RF. We tested the use of GHOST with four different classifiers in combination with two molecular descriptors, and we found that most classifiers benefit from threshold optimization. GHOST also outperformed other strategies, including random undersampling and conformal prediction. Finally, we show that our thresholding procedures can be effectively applied to real-world drug discovery projects, where the imbalance and characteristics of the data vary greatly between the training and test sets.
Collapse
Affiliation(s)
- Carmen Esposito
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Gregory A Landrum
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland.,T5 Informatics GmbH, Spalenring 11, 4055 Basel, Switzerland
| | - Nadine Schneider
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
36
|
Qiang B, Lai J, Jin H, Zhang L, Liu Z. Target Prediction Model for Natural Products Using Transfer Learning. Int J Mol Sci 2021; 22:4632. [PMID: 33924898 PMCID: PMC8124298 DOI: 10.3390/ijms22094632] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 04/23/2021] [Accepted: 04/26/2021] [Indexed: 11/16/2022] Open
Abstract
A large proportion of lead compounds are derived from natural products. However, most natural products have not been fully tested for their targets. To help resolve this problem, a model using transfer learning was built to predict targets for natural products. The model was pre-trained on a processed ChEMBL dataset and then fine-tuned on a natural product dataset. Benefitting from transfer learning and the data balancing technique, the model achieved a highly promising area under the receiver operating characteristic curve (AUROC) score of 0.910, with limited task-related training samples. Since the embedding distribution difference is reduced, embedding space analysis demonstrates that the model's outputs of natural products are reliable. Case studies have proved our model's performance in drug datasets. The fine-tuned model can successfully output all the targets of 62 drugs. Compared with a previous study, our model achieved better results in terms of both AUROC validation and its success rate for obtaining active targets among the top ones. The target prediction model using transfer learning can be applied in the field of natural product-based drug discovery and has the potential to find more lead compounds or to assist researchers in drug repurposing.
Collapse
Affiliation(s)
| | | | | | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing 100191, China; (B.Q.); (J.L.); (H.J.)
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing 100191, China; (B.Q.); (J.L.); (H.J.)
| |
Collapse
|
37
|
Lopez-Del Rio A, Picart-Armada S, Perera-Lluna A. Balancing Data on Deep Learning-Based Proteochemometric Activity Classification. J Chem Inf Model 2021; 61:1657-1669. [PMID: 33779173 PMCID: PMC8594867 DOI: 10.1021/acs.jcim.1c00086] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
In
silico analysis of biological activity data has become an essential
technique in pharmaceutical development. Specifically, the so-called
proteochemometric models aim to share information between targets
in machine learning ligand–target activity prediction models.
However, bioactivity data sets used in proteochemometric modeling
are usually imbalanced, which could potentially affect the performance
of the models. In this work, we explored the effect of different balancing
strategies in deep learning proteochemometric target–compound
activity classification models while controlling for the compound
series bias through clustering. These strategies were (1) no_resampling,
(2) resampling_after_clustering, (3) resampling_before_clustering,
and (4) semi_resampling. These schemas were evaluated in kinases,
GPCRs, nuclear receptors, and proteases from BindingDB. We observed
that the predicted proportion of positives was driven by the actual
data balance in the test set. Additionally, it was confirmed that
data balance had an impact on the performance estimates of the proteochemometric
model. We recommend a combination of data augmentation and clustering
in the training set (semi_resampling) to mitigate the data imbalance
effect in a realistic scenario. The code of this analysis is publicly
available at https://github.com/b2slab/imbalance_pcm_benchmark.
Collapse
Affiliation(s)
- Angela Lopez-Del Rio
- B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, 08028 Barcelona, Spain.,Department of Biomedical Engineering, Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, 08950 Esplugues de Llobregat, Spain
| | - Sergio Picart-Armada
- B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, 08028 Barcelona, Spain.,Department of Biomedical Engineering, Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, 08950 Esplugues de Llobregat, Spain
| | - Alexandre Perera-Lluna
- B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, 08028 Barcelona, Spain.,Department of Biomedical Engineering, Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, 08950 Esplugues de Llobregat, Spain
| |
Collapse
|
38
|
Shen C, Weng G, Zhang X, Leung ELH, Yao X, Pang J, Chai X, Li D, Wang E, Cao D, Hou T. Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening? Brief Bioinform 2021; 22:6070382. [PMID: 33418562 DOI: 10.1093/bib/bbaa410] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 11/26/2020] [Accepted: 12/12/2020] [Indexed: 12/13/2022] Open
Abstract
Machine-learning (ML)-based scoring functions (MLSFs) have gradually emerged as a promising alternative for protein-ligand binding affinity prediction and structure-based virtual screening. However, clouds of doubts have still been raised against the benefits of this novel type of scoring functions (SFs). In this study, to benchmark the performance of target-specific MLSFs on a relatively unbiased dataset, the MLSFs trained from three representative protein-ligand interaction representations were assessed on the LIT-PCBA dataset, and the classical Glide SP SF and three types of ligand-based quantitative structure-activity relationship (QSAR) models were also utilized for comparison. Two major aspects in virtual screening campaigns, including prediction accuracy and hit novelty, were systematically explored. The calculation results illustrate that the tested target-specific MLSFs yielded generally superior performance over the classical Glide SP SF, but they could hardly outperform the 2D fingerprint-based QSAR models. Although substantial improvements could be achieved by integrating multiple types of protein-ligand interaction features, the MLSFs were still not sufficient to exceed MACCS-based QSAR models. In terms of the correlations between the hit ranks or the structures of the top-ranked hits, the MLSFs developed by different featurization strategies would have the ability to identify quite different hits. Nevertheless, it seems that target-specific MLSFs do not have the intrinsic attributes of a traditional SF and may not be a substitute for classical SFs. In contrast, MLSFs can be regarded as a new derivative of ligand-based QSAR models. It is expected that our study may provide valuable guidance for the assessment and further development of target-specific MLSFs.
Collapse
Affiliation(s)
- Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Gaoqi Weng
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xujun Zhang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Elaine Lai-Han Leung
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, SAR, China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, SAR, China
| | - Jinping Pang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xin Chai
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dan Li
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Ercheng Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
39
|
Cáceres EL, Mew NC, Keiser MJ. Adding Stochastic Negative Examples into Machine Learning Improves Molecular Bioactivity Prediction. J Chem Inf Model 2020; 60:5957-5970. [PMID: 33245237 DOI: 10.1021/acs.jcim.0c00565] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Multitask deep neural networks learn to predict ligand-target binding by example, yet public pharmacological data sets are sparse, imbalanced, and approximate. We constructed two hold-out benchmarks to approximate temporal and drug-screening test scenarios, whose characteristics differ from a random split of conventional training data sets. We developed a pharmacological data set augmentation procedure, Stochastic Negative Addition (SNA), which randomly assigns untested molecule-target pairs as transient negative examples during training. Under the SNA procedure, drug-screening benchmark performance increases from R2 = 0.1926 ± 0.0186 to 0.4269 ± 0.0272 (122%). This gain was accompanied by a modest decrease in the temporal benchmark (13%). SNA increases in drug-screening performance were consistent for classification and regression tasks and outperformed y-randomized controls. Our results highlight where data and feature uncertainty may be problematic and how leveraging uncertainty into training improves predictions of drug-target relationships.
Collapse
Affiliation(s)
- Elena L Cáceres
- Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Bakar Computational Health Sciences Institute, Kavli Institute for Fundamental Neuroscience, Institute for Neurodegenerative Diseases, University of California, San Francisco, 675 Nelson Rising Ln NS 416A, San Francisco, California 94143, United States
| | - Nicholas C Mew
- Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Bakar Computational Health Sciences Institute, Kavli Institute for Fundamental Neuroscience, Institute for Neurodegenerative Diseases, University of California, San Francisco, 675 Nelson Rising Ln NS 416A, San Francisco, California 94143, United States
| | - Michael J Keiser
- Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Bakar Computational Health Sciences Institute, Kavli Institute for Fundamental Neuroscience, Institute for Neurodegenerative Diseases, University of California, San Francisco, 675 Nelson Rising Ln NS 416A, San Francisco, California 94143, United States
| |
Collapse
|