1
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
2
|
Priya S, Tripathi G, Singh DB, Jain P, Kumar A. Machine learning approaches and their applications in drug discovery and design. Chem Biol Drug Des 2022; 100:136-153. [PMID: 35426249 DOI: 10.1111/cbdd.14057] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 03/30/2022] [Accepted: 04/10/2022] [Indexed: 01/04/2023]
Abstract
This review is focused on several machine learning approaches used in chemoinformatics. Machine learning approaches provide tools and algorithms to improve drug discovery. Many physicochemical properties of drugs like toxicity, absorption, drug-drug interaction, carcinogenesis, and distribution have been effectively modeled by QSAR techniques. Machine learning is a subset of artificial intelligence, and this technique has shown tremendous potential in the field of drug discovery. Techniques discussed in this review are capable of modeling non-linear datasets, as well as big data of increasing depth and complexity. Various machine learning-based approaches are being used for drug target prediction, modeling the structure of drug target, binding site prediction, ligand-based similarity searching, de novo designing of ligands with desired properties, developing scoring functions for molecular docking, building QSAR model for biological activity prediction, and prediction of pharmacokinetic and pharmacodynamic properties of ligands. In recent years, these predictive tools and models have achieved good accuracy. By the use of more related input data, relevant parameters, and appropriate algorithms, the accuracy of these predictions can be further improved.
Collapse
Affiliation(s)
- Sonal Priya
- Department of Chemistry, T. N. B. College, TMBU, Bhagalpur, India
| | - Garima Tripathi
- Department of Chemistry, T. N. B. College, TMBU, Bhagalpur, India
| | - Dev Bukhsh Singh
- Department of Biotechnology, Siddharth University, Siddharth Nagar, India
| | - Priyanka Jain
- National Institute of Plant Genome Research, New Delhi, India
| | - Abhijeet Kumar
- Department of Chemistry, Mahatma Gandhi Central University, Motihari, India
| |
Collapse
|
3
|
Li J, Luo D, Wen T, Liu Q, Mo Z. Representative feature selection of molecular descriptors in QSAR modeling. J Mol Struct 2021. [DOI: 10.1016/j.molstruc.2021.131249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
4
|
Wen H, Su Y, Wang Z, Jin S, Ren J, Shen W, Eden M. A systematic modeling methodology of deep neural network‐based structure‐property relationship for rapid and reliable prediction on flashpoints. AIChE J 2021. [DOI: 10.1002/aic.17402] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Huaqiang Wen
- School of Chemistry and Chemical Engineering Chongqing University Chongqing China
| | - Yang Su
- School of Intelligent Technology and Engineering Chongqing University of Science and Technology Chongqing China
| | - Zihao Wang
- Process Systems Engineering Max Planck Institute for Dynamics of Complex Technical Systems Magdeburg Germany
| | - Saimeng Jin
- School of Chemistry and Chemical Engineering Chongqing University Chongqing China
| | - Jingzheng Ren
- Department of Industrial and Systems Engineering The Hong Kong Polytechnic University Hong Kong
| | - Weifeng Shen
- School of Chemistry and Chemical Engineering Chongqing University Chongqing China
| | - Mario Eden
- Department of Chemical Engineering Auburn University Auburn AL USA
| |
Collapse
|
5
|
Chong LC, Gandhi G, Lee JM, Yeo WWY, Choi SB. Drug Discovery of Spinal Muscular Atrophy (SMA) from the Computational Perspective: A Comprehensive Review. Int J Mol Sci 2021; 22:8962. [PMID: 34445667 PMCID: PMC8396480 DOI: 10.3390/ijms22168962] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 01/27/2021] [Indexed: 01/02/2023] Open
Abstract
Spinal muscular atrophy (SMA), one of the leading inherited causes of child mortality, is a rare neuromuscular disease arising from loss-of-function mutations of the survival motor neuron 1 (SMN1) gene, which encodes the SMN protein. When lacking the SMN protein in neurons, patients suffer from muscle weakness and atrophy, and in the severe cases, respiratory failure and death. Several therapeutic approaches show promise with human testing and three medications have been approved by the U.S. Food and Drug Administration (FDA) to date. Despite the shown promise of these approved therapies, there are some crucial limitations, one of the most important being the cost. The FDA-approved drugs are high-priced and are shortlisted among the most expensive treatments in the world. The price is still far beyond affordable and may serve as a burden for patients. The blooming of the biomedical data and advancement of computational approaches have opened new possibilities for SMA therapeutic development. This article highlights the present status of computationally aided approaches, including in silico drug repurposing, network driven drug discovery as well as artificial intelligence (AI)-assisted drug discovery, and discusses the future prospects.
Collapse
Affiliation(s)
- Li Chuin Chong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (L.C.C.); (J.M.L.)
| | - Gayatri Gandhi
- Perdana University Graduate School of Medicine, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (G.G.); (W.W.Y.Y.)
| | - Jian Ming Lee
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (L.C.C.); (J.M.L.)
| | - Wendy Wai Yeng Yeo
- Perdana University Graduate School of Medicine, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (G.G.); (W.W.Y.Y.)
| | - Sy-Bing Choi
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (L.C.C.); (J.M.L.)
| |
Collapse
|
6
|
Elbadawi M, Gaisford S, Basit AW. Advanced machine-learning techniques in drug discovery. Drug Discov Today 2020; 26:769-777. [PMID: 33290820 DOI: 10.1016/j.drudis.2020.12.003] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 11/16/2020] [Accepted: 12/02/2020] [Indexed: 01/20/2023]
Abstract
The popularity of machine learning (ML) across drug discovery continues to grow, yielding impressive results. As their use increases, so do their limitations become apparent. Such limitations include their need for big data, sparsity in data, and their lack of interpretability. It has also become apparent that the techniques are not truly autonomous, requiring retraining even post deployment. In this review, we detail the use of advanced techniques to circumvent these challenges, with examples drawn from drug discovery and allied disciplines. In addition, we present emerging techniques and their potential role in drug discovery. The techniques presented herein are anticipated to expand the applicability of ML in drug discovery.
Collapse
Affiliation(s)
- Moe Elbadawi
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK
| | - Simon Gaisford
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK; FabRx Ltd, 3 Romney Road, Ashford, TN24 0RW, UK
| | - Abdul W Basit
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK; FabRx Ltd, 3 Romney Road, Ashford, TN24 0RW, UK.
| |
Collapse
|
7
|
Wang Y, Chen X. QSPR model for Caco-2 cell permeability prediction using a combination of HQPSO and dual-RBF neural network. RSC Adv 2020; 10:42938-42952. [PMID: 35514900 PMCID: PMC9058322 DOI: 10.1039/d0ra08209k] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 11/06/2020] [Indexed: 12/23/2022] Open
Abstract
The Caco-2 cell model is widely used to evaluate the in vitro human intestinal permeability of drugs due to its morphological and functional similarity to human enterocytes. Although it is safe and relatively economic, it is time-consuming. A rapid and accurate quantitative structure-property relationship (QSPR) model of Caco-2 permeability is helpful to improve the efficiency of oral drug development. The aim of our study is to explore the predictive ability of the QSPR model, to study its permeation mechanism, and to develop a potential permeability prediction model, for Caco-2 cells. In our study, a relatively large data set was collected and the abnormal data were eliminated using the Monte Carlo regression and hybrid quantum particle swarm optimization (HQPSO) algorithm. Then, the remaining 1827 compounds were used to establish QSPR models. To generate multiple chemically diverse training and test sets, we used a combination of principal component analysis (PCA) and self-organizing mapping (SOM) neural networks to split the modeling data set characterized by PaDEL-descriptors. After preliminary selection of descriptors by the mean decrease impurity (MDI) method, the HQPSO algorithm was used to select the key descriptors. Six different methods, namely, multivariate linear regression (MLR), support vector machine regression (SVR), xgboost, radial basis function (RBF) neural networks, dual-SVR and dual-RBF were employed to develop QSPR models. The best dual-RBF model was obtained finally with R 2 = 0.91, and R cv5 2 = 0.77, for the training set, and R T 2 = 0.77, for the test set. A series of validation methods were used to assess the robustness and predictive ability of the dual-RBF model under OECD principles. A new application domain (AD) definition method based on the descriptor importance-weighted and distance-based (IWD) method was proposed, and the outliers were analyzed carefully. Combined with the importance of the descriptors used in the dual-RBF model, we concluded that the "H E-state" and hydrogen bonds are important factors affecting the permeability of drugs passing through the Caco-2 cell. Compared with the reported studies, our method exhibits certain advantages in data size, transparency of modeling process and prediction accuracy to some extent, and is a promising tool for virtual screening in the early stage of drug development.
Collapse
Affiliation(s)
- Yukun Wang
- School of Chemical Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China
- School of Electronic and Information Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China +864125928367
| | - Xuebo Chen
- School of Electronic and Information Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China +864125928367
| |
Collapse
|
8
|
Keyvanpour MR, Shirzad MB. An Analysis of QSAR Research Based on Machine Learning Concepts. Curr Drug Discov Technol 2020; 18:17-30. [PMID: 32178612 DOI: 10.2174/1570163817666200316104404] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 08/22/2019] [Accepted: 10/28/2019] [Indexed: 11/22/2022]
Abstract
Quantitative Structure-Activity Relationship (QSAR) is a popular approach developed to correlate chemical molecules with their biological activities based on their chemical structures. Machine learning techniques have proved to be promising solutions to QSAR modeling. Due to the significant role of machine learning strategies in QSAR modeling, this area of research has attracted much attention from researchers. A considerable amount of literature has been published on machine learning based QSAR modeling methodologies whilst this domain still suffers from lack of a recent and comprehensive analysis of these algorithms. This study systematically reviews the application of machine learning algorithms in QSAR, aiming to provide an analytical framework. For this purpose, we present a framework called 'ML-QSAR'. This framework has been designed for future research to: a) facilitate the selection of proper strategies among existing algorithms according to the application area requirements, b) help to develop and ameliorate current methods and c) providing a platform to study existing methodologies comparatively. In ML-QSAR, first a structured categorization is depicted which studied the QSAR modeling research based on machine models. Then several criteria are introduced in order to assess the models. Finally, inspired by aforementioned criteria the qualitative analysis is carried out.
Collapse
Affiliation(s)
| | - Mehrnoush Barani Shirzad
- Data Mining Research Laboratory, Department of Computer Engineering, Alzahra University, Tehran, Iran
| |
Collapse
|
9
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 351] [Impact Index Per Article: 70.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
10
|
Srinivas R, Klimovich PV, Larson EC. Implicit-descriptor ligand-based virtual screening by means of collaborative filtering. J Cheminform 2018; 10:56. [PMID: 30467684 PMCID: PMC6755561 DOI: 10.1186/s13321-018-0310-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 11/13/2018] [Indexed: 12/20/2022] Open
Abstract
Current ligand-based machine learning methods in virtual screening rely heavily on molecular fingerprinting for preprocessing, i.e., explicit description of ligands’ structural and physicochemical properties in a vectorized form. Of particular importance to current methods are the extent to which molecular fingerprints describe a particular ligand and what metric sufficiently captures similarity among ligands. In this work, we propose and evaluate methods that do not require explicit feature vectorization through fingerprinting, but, instead, provide implicit descriptors based only on other known assays. Our methods are based upon well known collaborative filtering algorithms used in recommendation systems. Our implicit descriptor method does not require any fingerprint similarity search, which makes the method free of the bias arising from the empirical nature of the fingerprint models. We show that implicit methods significantly outperform traditional machine learning methods, and the main strengths of implicit methods are their resilience to target-ligand sparsity and high potential for spotting promiscuous ligands.
Collapse
Affiliation(s)
- Raghuram Srinivas
- Department of Computer Science and Engineering, Bobby B. Lyle School of Engineering, Southern Methodist University, 3145 Dyer Street, Dallas, TX, 75205, USA. .,DataScience@SMU, Dallas, 75205, TX, USA.
| | - Pavel V Klimovich
- Department of Computer Science and Engineering, Bobby B. Lyle School of Engineering, Southern Methodist University, 3145 Dyer Street, Dallas, TX, 75205, USA.,The Dedman College Interdisciplinary Institute, 3225 Daniel Avenue, Dallas, TX, 75205, USA
| | - Eric C Larson
- Department of Computer Science and Engineering, Bobby B. Lyle School of Engineering, Southern Methodist University, 3145 Dyer Street, Dallas, TX, 75205, USA
| |
Collapse
|
11
|
Lo YC, Rensi SE, Torng W, Altman RB. Machine learning in chemoinformatics and drug discovery. Drug Discov Today 2018; 23:1538-1546. [PMID: 29750902 DOI: 10.1016/j.drudis.2018.05.010] [Citation(s) in RCA: 451] [Impact Index Per Article: 75.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 03/29/2018] [Accepted: 05/02/2018] [Indexed: 01/03/2023]
Abstract
Chemoinformatics is an established discipline focusing on extracting, processing and extrapolating meaningful data from chemical structures. With the rapid explosion of chemical 'big' data from HTS and combinatorial synthesis, machine learning has become an indispensable tool for drug designers to mine chemical information from large compound databases to design drugs with important biological properties. To process the chemical data, we first reviewed multiple processing layers in the chemoinformatics pipeline followed by the introduction of commonly used machine learning models in drug discovery and QSAR analysis. Here, we present basic principles and recent case studies to demonstrate the utility of machine learning techniques in chemoinformatics analyses; and we discuss limitations and future directions to guide further development in this evolving field.
Collapse
Affiliation(s)
- Yu-Chen Lo
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Stefano E Rensi
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Wen Torng
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Russ B Altman
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
| |
Collapse
|
12
|
Shiri F, Pirhadi S, Rahmani A. Identification of new potential HIV-1 reverse transcriptase inhibitors by QSAR modeling and structure-based virtual screening. J Recept Signal Transduct Res 2017; 38:37-47. [PMID: 29254400 DOI: 10.1080/10799893.2017.1414844] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Non-nucleoside reverse transcriptase inhibitors (NNRTIs) have gained a definitive place due to their unique antiviral potency, high specificity and low toxicity in antiretroviral combination therapies which are used to treat HIV. To design more specific HIV-1 inhibitors, 218 diverse non-nucleoside reverse transcriptase inhibitors with their EC50 values were collected. Then, different types of molecular descriptors were calculated. Also, genetic algorithm (GA) and enhanced replacement methods (ERM) were used as the variable selection approaches to choose more relevant features. Based on selected descriptors, a classification support vector machine (SVM) model was constructed to categorize compounds into two groups of active and inactive ones. The most active compound in the set was docked and was used as the input to the Pharmit server to screen the Molport and PubChem libraries by constructing a structure-based pharmacophore model. Shape filters for the protein and ligand as well as Lipinski's rule of five have been applied to filter out the output of virtual screening from pharmacophore search. Three hundred and thirty-four compounds were finally retrieved from the virtual screening and were fed to the previously constructed SVM model. Among them, the SVM model rendered seven active compounds and they were also analyzed by docking calculations and ADME/Tox parameters.
Collapse
Affiliation(s)
- Fereshteh Shiri
- a Department of Chemistry , University of Zabol , Zabol , Iran
| | - Somayeh Pirhadi
- b Medicinal and Natural Products Chemistry Research Center , Shiraz University of Medical Sciences , Shiraz , Iran
| | - Azita Rahmani
- a Department of Chemistry , University of Zabol , Zabol , Iran
| |
Collapse
|