1
|
Catacutan DB, Alexander J, Arnold A, Stokes JM. Machine learning in preclinical drug discovery. Nat Chem Biol 2024:10.1038/s41589-024-01679-1. [PMID: 39030362 DOI: 10.1038/s41589-024-01679-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 06/13/2024] [Indexed: 07/21/2024]
Abstract
Drug-discovery and drug-development endeavors are laborious, costly and time consuming. These programs can take upward of 12 years and cost US $2.5 billion, with a failure rate of more than 90%. Machine learning (ML) presents an opportunity to improve the drug-discovery process. Indeed, with the growing abundance of public and private large-scale biological and chemical datasets, ML techniques are becoming well positioned as useful tools that can augment the traditional drug-development process. In this Perspective, we discuss the integration of algorithmic methods throughout the preclinical phases of drug discovery. Specifically, we highlight an array of ML-based efforts, across diverse disease areas, to accelerate initial hit discovery, mechanism-of-action (MOA) elucidation and chemical property optimization. With advances in the application of ML across diverse therapeutic areas, we posit that fully ML-integrated drug-discovery pipelines will define the future of drug-development programs.
Collapse
Affiliation(s)
- Denise B Catacutan
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jeremie Alexander
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Autumn Arnold
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jonathan M Stokes
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada.
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada.
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada.
| |
Collapse
|
2
|
Draper MR, Waterman A, Dannatt JE, Patel P. Integrating multiscale and machine learning approaches towards the SAMPL9 log P challenge. Phys Chem Chem Phys 2024; 26:7907-7919. [PMID: 38376855 PMCID: PMC10938873 DOI: 10.1039/d3cp04140a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
The partition coefficient (log P) is an important physicochemical property that provides information regarding a molecule's pharmacokinetics, toxicity, and bioavailability. Methods to accurately predict the partition coefficient have the potential to accelerate drug design. In an effort to test current methods and explore new computational techniques, the statistical assessment of the modeling of proteins and ligands (SAMPL) has established a blind prediction challenge. The ninth iteration challenge was to predict the toluene-water partition coefficient (log Ptol/w) of sixteen drug molecules. Herein, three approaches are reported broadly under the categories of quantum mechanics (QM), molecular mechanics (MM), and data-driven machine learning (ML). The three blind submissions yield mean unsigned errors (MUE) ranging from 1.53-2.93 log Ptol/w units. The MUEs were reduced to 1.00 log Ptol/w for the QM methods. While MM and ML methods outperformed DFT approaches for challenge molecules with fewer rotational degrees of freedom, they suffered for the larger molecules in this dataset. Overall, DFT functionals paired with a triple-ζ basis set were the simplest and most effective tool to obtain quantitatively accurate partition coefficients.
Collapse
Affiliation(s)
- Michael R Draper
- Chemistry Department, University of Dallas, Irving, Texas, 75062, USA.
| | - Asa Waterman
- Chemistry Department, University of Dallas, Irving, Texas, 75062, USA.
| | | | - Prajay Patel
- Chemistry Department, University of Dallas, Irving, Texas, 75062, USA.
| |
Collapse
|
3
|
Tran TTV, Tayara H, Chong KT. Recent Studies of Artificial Intelligence on In Silico Drug Absorption. J Chem Inf Model 2023; 63:6198-6211. [PMID: 37819031 DOI: 10.1021/acs.jcim.3c00960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Absorption is an important area of research in pharmacochemistry and drug development, because the drug has to be absorbed before any drug effects can occur. Furthermore, the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile of drugs can be directly and considerably altered by modulating factors affecting absorption. Many drugs in development fail because of poor absorption. The research and continuous efforts of researchers in recent years have brought many successes and promises in drug absorption property prediction, especially in silico, which helps to reduce the time and cost significantly for screening undesirable drug candidates. In this report, we explicitly provide an overview of recent in silico studies on predicting absorption properties, especially from 2019 to the present, using artificial intelligence. Additionally, we have collected and investigated public databases that support absorption prediction research. On those grounds, we also proposed the challenges and development directions of absorption prediction in the future. We hope this review can provide researchers with valuable guidelines on absorption prediction to facilitate the development of newer approaches in drug discovery.
Collapse
Affiliation(s)
- Thi Tuyet Van Tran
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Faculty of Information Technology, An Giang University, Long Xuyen 880000, Vietnam
- Vietnam National University, Ho Chi Minh City, Ho Chi Minh 700000, Vietnam
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
4
|
Raza A, Chohan TA, Buabeid M, Arafa ESA, Chohan TA, Fatima B, Sultana K, Ullah MS, Murtaza G. Deep learning in drug discovery: a futuristic modality to materialize the large datasets for cheminformatics. J Biomol Struct Dyn 2023; 41:9177-9192. [PMID: 36305195 DOI: 10.1080/07391102.2022.2136244] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 10/08/2022] [Indexed: 10/31/2022]
Abstract
Artificial intelligence (AI) development imitates the workings of the human brain to comprehend modern problems. The traditional approaches such as high throughput screening (HTS) and combinatorial chemistry are lengthy and expensive to the pharmaceutical industry as they can only handle a smaller dataset. Deep learning (DL) is a sophisticated AI method that uses a thorough comprehension of particular systems. The pharmaceutical industry is now adopting DL techniques to enhance the research and development process. Multi-oriented algorithms play a crucial role in the processing of QSAR analysis, de novo drug design, ADME evaluation, physicochemical analysis, preclinical development, followed by clinical trial data precision. In this study, we investigated the performance of several algorithms, including deep neural networks (DNN), convolutional neural networks (CNN) and multi-task learning (MTL), with the aim of generating high-quality, interpretable big and diverse databases for drug design and development. Studies have demonstrated that CNN, recurrent neural network and deep belief network are compatible, accurate and effective for the molecular description of pharmacodynamic properties. In Covid-19, existing pharmacological compounds has also been repurposed using DL models. In the absence of the Covid-19 vaccine, remdesivir and oseltamivir have been widely employed to treat severe SARS-CoV-2 infections. In conclusion, the results indicate the potential benefits of employing the DL strategies in the drug discovery process.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ali Raza
- Department of pharmaceutical chemistry, Faculty of Pharmacy, The University of Lahore, Pakistan
- Institute of Molecular Biology and Biochemistry, The University of Lahore, Pakistan
| | - Talha Ali Chohan
- Institute of Molecular Biology and Biochemistry, The University of Lahore, Pakistan
- Institute of Pharmaceutical Science, UVAS, Lahore, Pakistan
| | - Manal Buabeid
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, United Arab Emirates
| | - El-Shaima A Arafa
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, United Arab Emirates
- Centre of Medical and Bio-Allied Health Sciences Research, Ajman University, Ajman, United Arab Emirates
| | | | - Batool Fatima
- Department of biochemistry, Bahauddin Zakariya University, Multan, Pakistan
| | - Kishwar Sultana
- Department of pharmaceutical chemistry, Faculty of Pharmacy, The University of Lahore, Pakistan
| | - Malik Saad Ullah
- Department of Pharmacy, Government College University, Faisalabad, Pakistan
| | - Ghulam Murtaza
- Department of Pharmacy, COMSATS University Islamabad, Lahore Campus, Pakistan
| |
Collapse
|
5
|
Shirokii N, Din Y, Petrov I, Seregin Y, Sirotenko S, Razlivina J, Serov N, Vinogradov V. Quantitative Prediction of Inorganic Nanomaterial Cellular Toxicity via Machine Learning. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2023; 19:e2207106. [PMID: 36772908 DOI: 10.1002/smll.202207106] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/09/2023] [Indexed: 05/11/2023]
Abstract
Organic chemistry has seen colossal progress due to machine learning (ML). However, the translation of artificial intelligence (AI) into materials science is challenging, where biological behavior prediction becomes even more complicated. Nanotoxicity is a critical parameter that describes their interaction with the living organisms screened in every bio-related research. To prevent excessive experiments, such properties have to be pre-evaluated. Several existing ML models partially fulfill the gap by predicting whether a nanomaterial is toxic or not. Yet, this binary categorization neglects the concentration dependencies crucial for experimental scientists. Here, an ML-based approach is proposed to the quantitative prediction of inorganic nanomaterial cytotoxicity achieving the precision expressed by 10-fold cross-validation (CV) Q2 = 0.86 with the root mean squared error (RMSE) of 12.2% obtained by the correlation-based feature selection and grid search-based model hyperparameters optimization. To provide further model flexibility, quantitative atom property-based nanomaterial descriptors are introduced allowing the model to extrapolate on unseen samples. Feature importance is calculated to find an interpretable model with optimal decision-making. These findings allow experimental scientists to perform primary in silico candidate screening and minimize the number of excessive, labor-intensive experiments enabling the rapid development of nanomaterials for medicinal purposes.
Collapse
Affiliation(s)
- Nikolai Shirokii
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, 191002, Saint-Petersburg, Russian Federation
| | - Yevgeniya Din
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, 191002, Saint-Petersburg, Russian Federation
| | - Ilya Petrov
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, 191002, Saint-Petersburg, Russian Federation
| | - Yurii Seregin
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, 191002, Saint-Petersburg, Russian Federation
| | - Sofia Sirotenko
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, 191002, Saint-Petersburg, Russian Federation
| | - Julia Razlivina
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, 191002, Saint-Petersburg, Russian Federation
| | - Nikita Serov
- Advanced Engineering School, Almetyevsk State Oil Institute, Almetyevsk, Russia
| | - Vladimir Vinogradov
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, 191002, Saint-Petersburg, Russian Federation
| |
Collapse
|
6
|
Morita R, Shigeta Y, Harada R. Efficient screening of protein-ligand complexes in lipid bilayers using LoCoMock score. J Comput Aided Mol Des 2023; 37:217-225. [PMID: 36943644 DOI: 10.1007/s10822-023-00502-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 03/05/2023] [Indexed: 03/23/2023]
Abstract
Membrane proteins are attractive targets for drug discovery due to their crucial roles in various biological processes. Studying the binding poses of amphipathic molecules to membrane proteins is essential for understanding the functions of membrane proteins and docking simulations can facilitate the screening of protein-ligand complexes at low computational costs. However, identifying docking poses for a ligand in non-aqueous environments such as lipid bilayers can be challenging. To address this issue, we propose a new docking score called logP-corrected membrane docking (LoCoMock) score. To screen putative protein-ligand complexes embedded in a membrane, the LoCoMock score considers the affinity between a target ligand and the membrane. It combines the docking score of the protein-ligand complex with the logP of the target ligand. In demonstrations using several model ligands, the LoCoMock score screened more putative complexes than the conventional docking score. As extended docking, the LoCoMock score makes it possible to screen membrane proteins more effectively as drug targets than the conventional docking.
Collapse
Affiliation(s)
- Rikuri Morita
- Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, 305-8577, Tsukuba, Ibaraki, Japan.
| | - Yasuteru Shigeta
- Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, 305-8577, Tsukuba, Ibaraki, Japan
| | - Ryuhei Harada
- Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, 305-8577, Tsukuba, Ibaraki, Japan.
| |
Collapse
|
7
|
Win ZM, Cheong AMY, Hopkins WS. Using Machine Learning To Predict Partition Coefficient (Log P) and Distribution Coefficient (Log D) with Molecular Descriptors and Liquid Chromatography Retention Time. J Chem Inf Model 2023; 63:1906-1913. [PMID: 36926888 DOI: 10.1021/acs.jcim.2c01373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
During preclinical evaluations of drug candidates, several physicochemical (p-chem) properties are measured and employed as metrics to estimate drug efficacy in vivo. Two such p-chem properties are the octanol-water partition coefficient, Log P, and distribution coefficient, Log D, which are useful in estimating the distribution of drugs within the body. Log P and Log D are traditionally measured using the shake-flask method and high-performance liquid chromatography. However, it is challenging to measure these properties for species that are very hydrophobic (or hydrophilic) owing to the very low equilibrium concentrations partitioned into octanol (or aqueous) phases. Moreover, the shake-flask method is relatively time-consuming and can require multistep dilutions as the range of analyte concentrations can differ by several orders of magnitude. Here, we circumvent these limitations by using machine learning (ML) to correlate Log P and Log D with liquid chromatography (LC) retention time (RT). Predictive models based on four ML algorithms, which used molecular descriptors and LC RTs as features, were extensively tested and compared. The inclusion of RT as an additional descriptor improves model performance (MAE = 0.366 and R2 = 0.89), and Shapley additive explanations analysis indicates that RT has the highest impact on model accuracy.
Collapse
Affiliation(s)
- Zaw-Myo Win
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,School of Optometry, The Hong Kong Polytechnic University, Kowloon 999077, Hong Kong.,Department of Chemistry, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada
| | - Allen M Y Cheong
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,School of Optometry, The Hong Kong Polytechnic University, Kowloon 999077, Hong Kong
| | - W Scott Hopkins
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,Department of Chemistry, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.,Waterloo Institute for Nanotechnology, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.,WaterMine Innovation, Inc., Waterloo, Ontario N0B 2T0, Canada
| |
Collapse
|
8
|
Reetz MT, König G. n
‐Butanol: An Ecologically and Economically Viable Extraction Solvent for Isolating Polar Products from Aqueous Solutions. European J Org Chem 2021. [DOI: 10.1002/ejoc.202100829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Manfred T. Reetz
- Max-Planck-Institut für Kohlenforschung Kaiser-Wilhelm-Platz 1 45470 Mülheim an der Ruhr Germany
- Tianjin Institute of Industrial Biotechnology Chinese Academy of Sciences Tianjin China
| | - Gerhard König
- Centre for Enzyme Innovation University of Portsmouth St Michael's Building Portsmouth PO1 2DT United Kingdom
| |
Collapse
|
9
|
Carracedo-Reboredo P, Liñares-Blanco J, Rodríguez-Fernández N, Cedrón F, Novoa FJ, Carballal A, Maojo V, Pazos A, Fernandez-Lozano C. A review on machine learning approaches and trends in drug discovery. Comput Struct Biotechnol J 2021; 19:4538-4558. [PMID: 34471498 PMCID: PMC8387781 DOI: 10.1016/j.csbj.2021.08.011] [Citation(s) in RCA: 125] [Impact Index Per Article: 41.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 08/06/2021] [Accepted: 08/06/2021] [Indexed: 12/30/2022] Open
Abstract
Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.
Collapse
Key Words
- ADMET, Absorption, distribution, metabolism, elimination and toxicity
- ADR, Adverse Drug Reaction
- AI, Artificial Intelligence
- ANN, Artificial Neural Networks
- APFP, Atom Pairs 2d FingerPrint
- AUC, Area under the Curve
- BBB, Blood–Brain barrier
- CDK, Chemical Development Kit
- CNN, Convolutional Neural Networks
- CNS, Central Nervous System
- CPI, Compound-protein interaction
- CV, Cross Validation
- Cheminformatics
- DL, Deep Learning
- DNA, Deoxyribonucleic acid
- Deep Learning
- Drug Discovery
- ECFP, Extended Connectivity Fingerprints
- FDA, Food and Drug Administration
- FNN, Fully Connected Neural Networks
- FP, Fringerprints
- FS, Feature Selection
- GCN, Graph Convolutional Networks
- GEO, Gene Expression Omnibus
- GNN, Graph Neural Networks
- GO, Gene Ontology
- KEGG, Kyoto Encyclopedia of Genes and Genomes
- MACCS, Molecular ACCess System
- MCC, Matthews correlation coefficient
- MD, Molecular Descriptors
- MKL, Multiple Kernel Learning
- ML, Machine Learning
- Machine Learning
- Molecular Descriptors
- NB, Naive Bayes
- OOB, Out of Bag
- PCA, Principal Component Analyisis
- QSAR
- QSAR, Quantitative structure–activity relationship
- RF, Random Forest
- RNA, Ribonucleic Acid
- SMILES, simplified molecular-input line-entry system
- SVM, Support Vector Machines
- TCGA, The Cancer Genome Atlas
- WHO, World Health Organization
- t-SNE, t-Distributed Stochastic Neighbor Embedding
Collapse
Affiliation(s)
- Paula Carracedo-Reboredo
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Jose Liñares-Blanco
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
| | - Nereida Rodríguez-Fernández
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Francisco Cedrón
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Francisco J. Novoa
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Adrian Carballal
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Victor Maojo
- Biomedical Informatics Group, Artificial Intelligence Department, Polytechnic University of Madrid, Calle de los Ciruelos, Boadilla del Monte, Madrid 28660, Spain
| | - Alejandro Pazos
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| | - Carlos Fernandez-Lozano
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| |
Collapse
|
10
|
Wang F, Diao X, Chang S, Xu L. Recent Progress of Deep Learning in Drug Discovery. Curr Pharm Des 2021; 27:2088-2096. [PMID: 33511933 DOI: 10.2174/1381612827666210129123231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 11/11/2020] [Indexed: 11/22/2022]
Abstract
Deep learning, an emerging field of artificial intelligence based on neural networks in machine learning, has been applied in various fields and is highly valued. Herein, we mainly review several mainstream architectures in deep learning, including deep neural networks, convolutional neural networks and recurrent neural networks in the field of drug discovery. The applications of these architectures in molecular de novo design, property prediction, biomedical imaging and synthetic planning have also been explored. Apart from that, we further discuss the future direction of the deep learning approaches and the main challenges we need to address.
Collapse
Affiliation(s)
- Feng Wang
- College of Information Science and Engineering, Huaide College of Changzhou University, Taizhou 214500, China
| | - XiaoMin Diao
- College of Information Science and Engineering, Huaide College of Changzhou University, Taizhou 214500, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| |
Collapse
|
11
|
Bergazin TD, Tielker N, Zhang Y, Mao J, Gunner MR, Francisco K, Ballatore C, Kast SM, Mobley DL. Evaluation of log P, pK a, and log D predictions from the SAMPL7 blind challenge. J Comput Aided Mol Des 2021; 35:771-802. [PMID: 34169394 PMCID: PMC8224998 DOI: 10.1007/s10822-021-00397-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/05/2021] [Indexed: 12/16/2022]
Abstract
The Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges focuses the computational modeling community on areas in need of improvement for rational drug design. The SAMPL7 physical property challenge dealt with prediction of octanol-water partition coefficients and pKa for 22 compounds. The dataset was composed of a series of N-acylsulfonamides and related bioisosteres. 17 research groups participated in the log P challenge, submitting 33 blind submissions total. For the pKa challenge, 7 different groups participated, submitting 9 blind submissions in total. Overall, the accuracy of octanol-water log P predictions in the SAMPL7 challenge was lower than octanol-water log P predictions in SAMPL6, likely due to a more diverse dataset. Compared to the SAMPL6 pKa challenge, accuracy remains unchanged in SAMPL7. Interestingly, here, though macroscopic pKa values were often predicted with reasonable accuracy, there was dramatically more disagreement among participants as to which microscopic transitions produced these values (with methods often disagreeing even as to the sign of the free energy change associated with certain transitions), indicating far more work needs to be done on pKa prediction methods.
Collapse
Affiliation(s)
| | - Nicolas Tielker
- Physikalische Chemie III, Technische Universität Dortmund, Otto-Hahn-Str. 4a, 44227, Dortmund, Germany
| | - Yingying Zhang
- Department of Physics, The Graduate Center, City University of New York, New York, 10016, USA
| | - Junjun Mao
- Department of Physics, City College of New York, New York, 10031, USA
| | - M R Gunner
- Department of Physics, The Graduate Center, City University of New York, New York, 10016, USA.,Department of Physics, City College of New York, New York, 10031, USA
| | - Karol Francisco
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, Ja Jolla, CA, 92093-0756, USA
| | - Carlo Ballatore
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, Ja Jolla, CA, 92093-0756, USA
| | - Stefan M Kast
- Physikalische Chemie III, Technische Universität Dortmund, Otto-Hahn-Str. 4a, 44227, Dortmund, Germany
| | - David L Mobley
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA, 92697, USA. .,Department of Chemistry, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
12
|
Exploring the octanol-water partition coefficient dataset using deep learning techniques and data augmentation. Commun Chem 2021; 4:90. [PMID: 36697535 PMCID: PMC9814212 DOI: 10.1038/s42004-021-00528-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 05/21/2021] [Indexed: 01/28/2023] Open
Abstract
Today more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself.
Collapse
|
13
|
Xie L, Xu L, Kong R, Chang S, Xu X. Improvement of Prediction Performance With Conjoint Molecular Fingerprint in Deep Learning. Front Pharmacol 2021; 11:606668. [PMID: 33488387 PMCID: PMC7819282 DOI: 10.3389/fphar.2020.606668] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/23/2020] [Indexed: 12/27/2022] Open
Abstract
The accurate predicting of physical properties and bioactivity of drug molecules in deep learning depends on how molecules are represented. Many types of molecular descriptors have been developed for quantitative structure-activity/property relationships quantitative structure-activity relationships (QSPR). However, each molecular descriptor is optimized for a specific application with encoding preference. Considering that standalone featurization methods may only cover parts of information of the chemical molecules, we proposed to build the conjoint fingerprint by combining two supplementary fingerprints. The impact of conjoint fingerprint and each standalone fingerprint on predicting performance was systematically evaluated in predicting the logarithm of the partition coefficient (logP) and binding affinity of protein-ligand by using machine learning/deep learning (ML/DL) methods, including random forest (RF), support vector regression (SVR), extreme gradient boosting (XGBoost), long short-term memory network (LSTM), and deep neural network (DNN). The results demonstrated that the conjoint fingerprint yielded improved predictive performance, even outperforming the consensus model using two standalone fingerprints among four out of five examined methods. Given that the conjoint fingerprint scheme shows easy extensibility and high applicability, we expect that the proposed conjoint scheme would create new opportunities for continuously improving predictive performance of deep learning by harnessing the complementarity of various types of fingerprints.
Collapse
Affiliation(s)
- Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China.,Jiangsu Sino-Israel Industrial Technology Research Institute, Changzhou, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|