1
|
Meng L, Zhou B, Liu H, Chen Y, Yuan R, Chen Z, Luo S, Chen H. Advancing toxicity studies of per- and poly-fluoroalkyl substances (pfass) through machine learning: Models, mechanisms, and future directions. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 946:174201. [PMID: 38936709 DOI: 10.1016/j.scitotenv.2024.174201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 06/17/2024] [Accepted: 06/20/2024] [Indexed: 06/29/2024]
Abstract
Perfluorinated and perfluoroalkyl substances (PFASs), encompassing a vast array of isomeric chemicals, are recognized as typical emerging contaminants with direct or potential impacts on human health and the ecological environment. With the complex and elusive toxicological profiles of PFASs, machine learning (ML) has been increasingly employed in their toxicity studies due to its proficiency in prediction and data analytics. This integration is poised to become a predominant trend in environmental toxicology, propelled by the swift advancements in computational technology. This review diligently examines the literature to encapsulate the varied objectives of employing ML in the toxicity studies of PFASs: (1) Utilizing ML to establish Quantitative Structure-Activity Relationship (QSAR) models for PFASs with diverse toxicity endpoints, facilitating the targeted toxicity prediction of unidentified PFASs; (2) Investigating and substantiating the Adverse Outcome Pathway (AOP) through the synergy of ML and traditional toxicological methods, with this refining the toxicity assessment framework for PFASs; (3) Dissecting and elucidating the features of established ML models to advance Open Research into the toxicity of PFASs, with a primary focus on determinants and mechanisms. The discourse extends to an in-depth examination of ML studies, segregating findings based on their distinct application trajectories. Given that ML represents a nascent paradigm within PFASs research, this review delineates the collective challenges encountered in the ML-mediated study of PFAS toxicity and proffers strategic guidance for ensuing investigations.
Collapse
Affiliation(s)
- Lingxuan Meng
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Beihai Zhou
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Haijun Liu
- School of Resources and Environment, Anqing Normal University, Anqing, China.
| | - Yuefang Chen
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Rongfang Yuan
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Zhongbing Chen
- Faculty of Environmental Sciences, Czech University of Life Sciences Prague, Kamýcká 129, 16500 Praha-Suchdol, Czech Republic.
| | - Shuai Luo
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Huilun Chen
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| |
Collapse
|
2
|
Huang ETC, Yang JS, Liao KYK, Tseng WCW, Lee CK, Gill M, Compas C, See S, Tsai FJ. Predicting blood-brain barrier permeability of molecules with a large language model and machine learning. Sci Rep 2024; 14:15844. [PMID: 38982309 PMCID: PMC11233737 DOI: 10.1038/s41598-024-66897-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 07/05/2024] [Indexed: 07/11/2024] Open
Abstract
Predicting the blood-brain barrier (BBB) permeability of small-molecule compounds using a novel artificial intelligence platform is necessary for drug discovery. Machine learning and a large language model on artificial intelligence (AI) tools improve the accuracy and shorten the time for new drug development. The primary goal of this research is to develop artificial intelligence (AI) computing models and novel deep learning architectures capable of predicting whether molecules can permeate the human blood-brain barrier (BBB). The in silico (computational) and in vitro (experimental) results were validated by the Natural Products Research Laboratories (NPRL) at China Medical University Hospital (CMUH). The transformer-based MegaMolBART was used as the simplified molecular input line entry system (SMILES) encoder with an XGBoost classifier as an in silico method to check if a molecule could cross through the BBB. We used Morgan or Circular fingerprints to apply the Morgan algorithm to a set of atomic invariants as a baseline encoder also with an XGBoost classifier to compare the results. BBB permeability was assessed in vitro using three-dimensional (3D) human BBB spheroids (human brain microvascular endothelial cells, brain vascular pericytes, and astrocytes). Using multiple BBB databases, the results of the final in silico transformer and XGBoost model achieved an area under the receiver operating characteristic curve of 0.88 on the held-out test dataset. Temozolomide (TMZ) and 21 randomly selected BBB permeable compounds (Pred scores = 1, indicating BBB-permeable) from the NPRL penetrated human BBB spheroid cells. No evidence suggests that ferulic acid or five BBB-impermeable compounds (Pred scores < 1.29423E-05, which designate compounds that pass through the human BBB) can pass through the spheroid cells of the BBB. Our validation of in vitro experiments indicated that the in silico prediction of small-molecule permeation in the BBB model is accurate. Transformer-based models like MegaMolBART, leveraging the SMILES representations of molecules, show great promise for applications in new drug discovery. These models have the potential to accelerate the development of novel targeted treatments for disorders of the central nervous system.
Collapse
Affiliation(s)
- Eddie T C Huang
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Jai-Sing Yang
- Department of Medical Research, China Medical University Hospital, China Medical University, Taichung, Taiwan
| | - Ken Y K Liao
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Warren C W Tseng
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - C K Lee
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Michelle Gill
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Colin Compas
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Simon See
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Fuu-Jen Tsai
- School of Chinese Medicine, College of Chinese Medicine, China Medical University, China Medical University Children's Hospital, No. 2, Yude Road, Taichung, 404332, Taiwan.
- China Medical University Children's Hospital, Taichung, Taiwan.
| |
Collapse
|
3
|
Liu J, Khan MKH, Guo W, Dong F, Ge W, Zhang C, Gong P, Patterson TA, Hong H. Machine learning and deep learning approaches for enhanced prediction of hERG blockade: a comprehensive QSAR modeling study. Expert Opin Drug Metab Toxicol 2024; 20:665-684. [PMID: 38968091 DOI: 10.1080/17425255.2024.2377593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/26/2024] [Indexed: 07/07/2024]
Abstract
BACKGROUND Cardiotoxicity is a major cause of drug withdrawal. The hERG channel, regulating ion flow, is pivotal for heart and nervous system function. Its blockade is a concern in drug development. Predicting hERG blockade is essential for identifying cardiac safety issues. Various QSAR models exist, but their performance varies. Ongoing improvements show promise, necessitating continued efforts to enhance accuracy using emerging deep learning algorithms in predicting potential hERG blockade. STUDY DESIGN AND METHOD Using a large training dataset, six individual QSAR models were developed. Additionally, three ensemble models were constructed. All models were evaluated using 10-fold cross-validations and two external datasets. RESULTS The 10-fold cross-validations resulted in Mathews correlation coefficient (MCC) values from 0.682 to 0.730, surpassing the best-reported model on the same dataset (0.689). External validations yielded MCC values from 0.520 to 0.715 for the first dataset, exceeding those of previously reported models (0-0.599). For the second dataset, MCC values fell between 0.025 and 0.215, aligning with those of reported models (0.112-0.220). CONCLUSIONS The developed models can assist the pharmaceutical industry and regulatory agencies in predicting hERG blockage activity, thereby enhancing safety assessments and reducing the risk of adverse cardiac events associated with new drug candidates.
Collapse
Affiliation(s)
- Jie Liu
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Wenjing Guo
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Fan Dong
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Weigong Ge
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA
| | - Ping Gong
- Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| |
Collapse
|
4
|
Cao W, Wu N, Zhang S, Qi Y, Guo R, Wang Z, Qu R. Photodegradation of polychlorinated biphenyls in water/nitrogen-doped silica and air/nitrogen-doped silica systems: Kinetics, mechanism and quantitative structure activity relationship (QSAR) analysis. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 924:171586. [PMID: 38461975 DOI: 10.1016/j.scitotenv.2024.171586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 03/05/2024] [Accepted: 03/07/2024] [Indexed: 03/12/2024]
Abstract
Developing efficient and low-cost photocatalytic materials is essential for removing polychlorinated biphenyls (PCBs). In this work, the photodegradation process of fourteen representative polychlorinated biphenyls (PCBs) in both water/nitrogen-doped SiO2 (N-SiO2) and air/N-SiO2 systems was studied. The photodegradation kinetics of PCBs is consistent with the pseudo-first-order kinetic equation. The variation in the degradation effects of different PCBs in the two systems is primarily related to the position of the Cl substituent and the effective absorption wavelength range of PCBs. A total of fourteen intermediates for 4'-Dichlorobiphenyl (PCB-15), 2,2',4,4',6,6'-Hexachlorobiphenyl (PCB-155), and 2,2',3,3',4,4',5,5',6,6'-Decachlorobiphenyl (PCB-209) generated from four reaction pathways were identified based on both mass spectrometry analysis and theoretical calculations. Using the values of lnk (k denotes pseudo-first-order kinetic constants) for the 11 PCBs in the training set and the calculated molecular and structural parameters, quantitative structure-activity relationship (QSAR) models for the two systems were constructed by using multiple linear regression (MLR) method to better understand the factors affecting the photodegradation rate of PCBs. The QSAR equations were obtained with Cl atom substitution at position 3 (N3) as the main parameter, which were lnk = -1.98 - 0.19 N3 for the water/N-SiO2 system and lnk = -1.56 - 0.34 N3 for the air/N-SiO2 system, with the correlation coefficient (R2) of 0.66 and 0.73, leave-one-out cross-validation (Q2LOO) of 0.51 and 0.59, respectively, and bootstrapping validation coefficients (Q2BOOT) values of both 0.74, confirming that the models were well fitted and showed high robustness and prediction ability. This study provides valuable insights into photocatalytic degradation studies of PCBs.
Collapse
Affiliation(s)
- Wenqian Cao
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China
| | - Nannan Wu
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Institute of Agro-product Safety and Nutrition, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, PR China
| | - Shengnan Zhang
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China
| | - Yumeng Qi
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China
| | - Ruixue Guo
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China
| | - Zunyao Wang
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China
| | - Ruijuan Qu
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China.
| |
Collapse
|
5
|
Schieferdecker S, Rottach F, Vock E. In Silico Prediction of Oral Acute Rodent Toxicity Using Consensus Machine Learning. J Chem Inf Model 2024; 64:3114-3122. [PMID: 38498695 DOI: 10.1021/acs.jcim.4c00056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Acute oral toxicity (AOT) is required for the classification and labeling of chemicals according to the global harmonized system (GHS). Acute oral toxicity studies are optimized to minimize the use of animals. However, with the advent of the three Rs principles and machine learning in toxicology, alternative in silico methods became a reasonable alternative approach for addressing the AOT of new chemical matter. Here, we describe the compilation of AOT data from a commercial database and the development of a consensus classification model after evaluating different combinations of molecular representations and machine learning algorithms. The model shows significantly better performance compared to publicly available AOT models. Its performance was evaluated on an external validation data set, which was compiled from the literature, and an applicability domain was deduced.
Collapse
Affiliation(s)
| | - Florian Rottach
- Boehringer Ingelheim Pharma GmbH & Co. KG, 88397 Biberach, Germany
| | - Esther Vock
- Boehringer Ingelheim Pharma GmbH & Co. KG, 88397 Biberach, Germany
| |
Collapse
|
6
|
Biehn SE, Goncalves LM, Lehmann J, Marty JD, Mueller C, Ramirez SA, Tillier F, Sage CR. BioPrint meets the AI age: development of artificial intelligence-based ADMET models for the drug-discovery platform SAFIRE. Future Med Chem 2024; 16:587-599. [PMID: 38372202 DOI: 10.4155/fmc-2024-0007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 02/08/2024] [Indexed: 02/20/2024] Open
Abstract
Background: To prioritize compounds with a higher likelihood of success, artificial intelligence models can be used to predict absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of molecules quickly and efficiently. Methods: Models were trained with BioPrint database proprietary data along with public datasets to predict various ADMET end points for the SAFIRE platform. Results: SAFIRE models performed at or above 75% accuracy and 0.4 Matthew's correlation coefficient with validation sets. Training with both proprietary and public data improved model performance and expanded the chemical space on which the models were trained. The platform features scoring functionality to guide user decision-making. Conclusion: High-quality datasets along with chemical space considerations yielded ADMET models performing favorably with utility in the drug discovery process.
Collapse
Affiliation(s)
- Sarah E Biehn
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | | | - Juerg Lehmann
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Jessica D Marty
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Christoph Mueller
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Samuel A Ramirez
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Fabien Tillier
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Carleton R Sage
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| |
Collapse
|
7
|
Lee J, Beers JL, Geffert RM, Jackson KD. A Review of CYP-Mediated Drug Interactions: Mechanisms and In Vitro Drug-Drug Interaction Assessment. Biomolecules 2024; 14:99. [PMID: 38254699 PMCID: PMC10813492 DOI: 10.3390/biom14010099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 01/02/2024] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
Drug metabolism is a major determinant of drug concentrations in the body. Drug-drug interactions (DDIs) caused by the co-administration of multiple drugs can lead to alteration in the exposure of the victim drug, raising safety or effectiveness concerns. Assessment of the DDI potential starts with in vitro experiments to determine kinetic parameters and identify risks associated with the use of comedication that can inform future clinical studies. The diverse range of experimental models and techniques has significantly contributed to the examination of potential DDIs. Cytochrome P450 (CYP) enzymes are responsible for the biotransformation of many drugs on the market, making them frequently implicated in drug metabolism and DDIs. Consequently, there has been a growing focus on the assessment of DDI risk for CYPs. This review article provides mechanistic insights underlying CYP inhibition/induction and an overview of the in vitro assessment of CYP-mediated DDIs.
Collapse
Affiliation(s)
- Jonghwa Lee
- Division of Pharmacotherapy and Experimental Therapeutics, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (J.L.B.); (R.M.G.)
| | | | | | - Klarissa D. Jackson
- Division of Pharmacotherapy and Experimental Therapeutics, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (J.L.B.); (R.M.G.)
| |
Collapse
|
8
|
Lunghini F, Fava A, Pisapia V, Sacco F, Iaconis D, Beccari AR. ProfhEX: AI-based platform for small molecules liability profiling. J Cheminform 2023; 15:60. [PMID: 37296454 DOI: 10.1186/s13321-023-00728-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 05/28/2023] [Indexed: 06/12/2023] Open
Abstract
Off-target drug interactions are a major reason for candidate failure in the drug discovery process. Anticipating potential drug's adverse effects in the early stages is necessary to minimize health risks to patients, animal testing, and economical costs. With the constantly increasing size of virtual screening libraries, AI-driven methods can be exploited as first-tier screening tools to provide liability estimation for drug candidates. In this work we present ProfhEX, an AI-driven suite of 46 OECD-compliant machine learning models that can profile small molecules on 7 relevant liability groups: cardiovascular, central nervous system, gastrointestinal, endocrine, renal, pulmonary and immune system toxicities. Experimental affinity data was collected from public and commercial data sources. The entire chemical space comprised 289'202 activity data for a total of 210'116 unique compounds, spanning over 46 targets with dataset sizes ranging from 819 to 18896. Gradient boosting and random forest algorithms were initially employed and ensembled for the selection of a champion model. Models were validated according to the OECD principles, including robust internal (cross validation, bootstrap, y-scrambling) and external validation. Champion models achieved an average Pearson correlation coefficient of 0.84 (SD of 0.05), an R2 determination coefficient of 0.68 (SD = 0.1) and a root mean squared error of 0.69 (SD of 0.08). All liability groups showed good hit-detection power with an average enrichment factor at 5% of 13.1 (SD of 4.5) and AUC of 0.92 (SD of 0.05). Benchmarking against already existing tools demonstrated the predictive power of ProfhEX models for large-scale liability profiling. This platform will be further expanded with the inclusion of new targets and through complementary modelling approaches, such as structure and pharmacophore-based models. ProfhEX is freely accessible at the following address: https://profhex.exscalate.eu/ .
Collapse
Affiliation(s)
- Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Naples, Italy
| | - Anna Fava
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Naples, Italy
| | - Vincenzo Pisapia
- Professional Service Department, SAS Institute, Via Darwin 20/22, 20143, Milan, Italy
| | - Francesco Sacco
- Professional Service Department, SAS Institute, Via Darwin 20/22, 20143, Milan, Italy
| | - Daniela Iaconis
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Naples, Italy
| | | |
Collapse
|
9
|
Ye J, Li A, Zheng H, Yang B, Lu Y. Machine Learning Advances in Predicting Peptide/Protein-Protein Interactions Based on Sequence Information for Lead Peptides Discovery. Adv Biol (Weinh) 2023; 7:e2200232. [PMID: 36775876 DOI: 10.1002/adbi.202200232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 12/30/2022] [Indexed: 02/14/2023]
Abstract
Peptides have shown increasing advantages and significant clinical value in drug discovery and development. With the development of high-throughput technologies and artificial intelligence (AI), machine learning (ML) methods for discovering new lead peptides have been expanded and incorporated into rational drug design. Predictions of peptide-protein interactions (PepPIs) and protein-protein interactions (PPIs) are both opportunities and challenges in computational biology, which will help to better understand the mechanisms of disease and provide the impetus for the discovery of lead peptides. This paper comprehensively reviews computational models for PepPI and PPI predictions. It begins with an introduction of various databases of peptide ligands and target proteins. Then it discusses data formats and feature representations for proteins and peptides. Furthermore, classical ML methods and emerging deep learning (DL) methods that can be used to train prediction models of PepPI and PPI are classified into four categories, and their advantages and disadvantages are analyzed. To assess the relative performance of different models, different validation protocols and evaluation indexes are discussed. The goal of this review is to help researchers quickly get started to develop computational frameworks using these integrated resources and eventually promote the discovery of lead peptides.
Collapse
Affiliation(s)
- Jiahao Ye
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - An Li
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
- Department of Biochemical Pharmacy, School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| | - Hao Zheng
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Banghua Yang
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Yiming Lu
- School of Medicine, Shanghai University, Shanghai, 200444, China
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
- Department of Biochemical Pharmacy, School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| |
Collapse
|
10
|
Stewart M, Martin ST. Machine Learning for Ionization Potentials and Photoionization Cross Sections of Volatile Organic Compounds. ACS EARTH & SPACE CHEMISTRY 2023; 7:863-875. [PMID: 37152449 PMCID: PMC10152554 DOI: 10.1021/acsearthspacechem.3c00009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 03/16/2023] [Accepted: 03/17/2023] [Indexed: 05/09/2023]
Abstract
Molecular ionization potentials (IP) and photoionization cross sections (σ) can affect the sensitivity of photoionization detectors (PIDs) and other sensors for gaseous species. This study employs several methods of machine learning (ML) to predict IP and σ values at 10.6 eV (117 nm) for a dataset of 1251 gaseous organic species. The explicitness of the treatment of the species electronic structure progressively increases among the methods. The study compares the ML predictions of the IP and σ values to those obtained by quantum chemical calculations. The ML predictions are comparable in performance to those of the quantum calculations when evaluated against measurements. Pretraining further reduces the mean absolute errors (ε) compared to the measurements. The graph-based attentive fingerprint model was most accurate, for which εIP = 0.23 ± 0.01 eV and εσ = 2.8 ± 0.2 Mb compared to measurements and computed cross sections, respectively. The ML predictions for IP correlate well with both the measured IPs (R 2 = 0.88) and with IPs computed at the level of M06-2X/aug-cc-pVTZ (R 2 = 0.82). The ML predictions for σ correlated reasonably well with computed cross sections (R 2 = 0.66). The developed ML methods for IP and σ values, representing the properties of a generalizable set of volatile organic compounds (VOCs) relevant to industrial applications and atmospheric chemistry, can be used to quantitatively describe the species-dependent sensitivity of chemical sensors that use ionizing radiation as part of the sensing mechanism, such as photoionization detectors.
Collapse
Affiliation(s)
- Matthew
P. Stewart
- School
of Engineering and Applied Sciences, Harvard
University, Cambridge, Massachusetts 02138, United States
| | - Scot T. Martin
- School
of Engineering and Applied Sciences, Harvard
University, Cambridge, Massachusetts 02138, United States
- Department
of Earth and Planetary Sciences, Harvard
University, Cambridge, Massachusetts 02138, United States
| |
Collapse
|
11
|
McNair D. Artificial Intelligence and Machine Learning for Lead-to-Candidate Decision-Making and Beyond. Annu Rev Pharmacol Toxicol 2023; 63:77-97. [PMID: 35679624 DOI: 10.1146/annurev-pharmtox-051921-023255] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The use of artificial intelligence (AI) and machine learning (ML) in pharmaceutical research and development has to date focused on research: target identification; docking-, fragment-, and motif-based generation of compound libraries; modeling of synthesis feasibility; rank-ordering likely hits according to structural and chemometric similarity to compounds having known activity and affinity to the target(s); optimizing a smaller library for synthesis and high-throughput screening; and combining evidence from screening to support hit-to-lead decisions. Applying AI/ML methods to lead optimization and lead-to-candidate (L2C) decision-making has shown slower progress, especially regarding predicting absorption, distribution, metabolism, excretion, and toxicology properties. The present review surveys reasons why this is so, reports progress that has occurred in recent years, and summarizes some of the issues that remain. Effective AI/ML tools to derisk L2C and later phases of development are important to accelerate the pharmaceutical development process, ameliorate escalating development costs, and achieve greater success rates.
Collapse
Affiliation(s)
- Douglas McNair
- Global Health, Integrated Development, Bill & Melinda Gates Foundation, Seattle, Washington, USA;
| |
Collapse
|
12
|
Watanabe R, Kawata T, Ueda S, Shinbo T, Higashimori M, Natsume-Kitatani Y, Mizuguchi K. Prediction of the Contribution Ratio of a Target Metabolic Enzyme to Clearance from Chemical Structure Information. Mol Pharm 2023; 20:419-426. [PMID: 36538346 PMCID: PMC9812024 DOI: 10.1021/acs.molpharmaceut.2c00698] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 11/22/2022] [Accepted: 11/22/2022] [Indexed: 12/24/2022]
Abstract
The contribution ratio of metabolic enzymes such as cytochrome P450 to in vivo clearance (fraction metabolized: fm) is a pharmacokinetic index that is particularly important for the quantitative evaluation of drug-drug interactions. Since obtaining experimental in vivo fm values is challenging, those derived from in vitro experiments have often been used alternatively. This study aimed to explore the possibility of constructing machine learning models for predicting in vivo fm using chemical structure information alone. We collected in vivo fm values and chemical structures of 319 compounds from a public database with careful manual curation and constructed predictive models using several machine learning methods. The results showed that in vivo fm values can be obtained from structural information alone with a performance comparable to that based on in vitro experimental values and that the prediction accuracy for the compounds involved in CYP induction or inhibition is significantly higher than that by using in vitro values. Our new approach to predicting in vivo fm values in the early stages of drug discovery should help improve the efficiency of the drug optimization process.
Collapse
Affiliation(s)
- Reiko Watanabe
- Artificial
Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health
and Nutrition, Osaka 567-0085, Japan
- Institute
for Protein Research, Osaka University, Osaka 567-0085, Japan
| | - Toshio Kawata
- Science
Enablement Department, Data Science & Innovation Division, Research
& Development, AstraZeneca K.K., Osaka 530-0011, Japan
| | - Shinya Ueda
- Science
Enablement Department, Data Science & Innovation Division, Research
& Development, AstraZeneca K.K., Osaka 530-0011, Japan
| | - Takumi Shinbo
- Science
Enablement Department, Data Science & Innovation Division, Research
& Development, AstraZeneca K.K., Osaka 530-0011, Japan
| | - Mitsuo Higashimori
- Science
Enablement Department, Data Science & Innovation Division, Research
& Development, AstraZeneca K.K., Osaka 530-0011, Japan
| | - Yayoi Natsume-Kitatani
- Artificial
Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health
and Nutrition, Osaka 567-0085, Japan
- Institute
of Advanced Medical Sciences, Tokushima
University, Tokushima 567-0085, Japan
| | - Kenji Mizuguchi
- Artificial
Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health
and Nutrition, Osaka 567-0085, Japan
- Institute
for Protein Research, Osaka University, Osaka 567-0085, Japan
| |
Collapse
|
13
|
Physicochemical QSAR analysis of hERG inhibition revisited: towards a quantitative potency prediction. J Comput Aided Mol Des 2022; 36:837-849. [PMID: 36305984 DOI: 10.1007/s10822-022-00483-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 10/04/2022] [Indexed: 01/07/2023]
Abstract
In an earlier study (Didziapetris R & Lanevskij K (2016). J Comput Aided Mol Des. 30:1175-1188) we collected a database of publicly available hERG inhibition data for almost 6700 drug-like molecules and built a probabilistic Gradient Boosting classifier with a minimal set of physicochemical descriptors (log P, pKa, molecular size and topology parameters). This approach favored interpretability over statistical performance but still achieved an overall classification accuracy of 75%. In the current follow-up work we expanded the database (provided in Supplementary Information) to almost 9400 molecules and performed temporal validation of the model on a set of novel chemicals from recently published lead optimization projects. Validation results showed almost no performance degradation compared to the original study. Additionally, we rebuilt the model using AFT (Accelerated Failure Time) learning objective in XGBoost, which accepts both quantitative and censored data often reported in protein inhibition studies. The new model achieved a similar level of accuracy of discerning hERG blockers from non-blockers at 10 µM threshold, which can be conceived as close to the performance ceiling for methods aiming to describe only non-specific ligand interactions with hERG. Yet, this model outputs quantitative potency values (IC50) and is not tied to a particular classification cut-off. pIC50 from patch-clamp measurements can be predicted with R2 ≈ 0.4 and MAE < 0.5, which enables ligand ranking according to their expected potency levels. The employed approach can be valuable for quantitative modeling of various ADME and drug safety endpoints with a high prevalence of censored data.
Collapse
|
14
|
Gorgulla C, Jayaraj A, Fackeldey K, Arthanari H. Emerging frontiers in virtual drug discovery: From quantum mechanical methods to deep learning approaches. Curr Opin Chem Biol 2022; 69:102156. [PMID: 35576813 PMCID: PMC9990419 DOI: 10.1016/j.cbpa.2022.102156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 03/16/2022] [Accepted: 04/07/2022] [Indexed: 11/19/2022]
Abstract
Virtual screening-based approaches to discover initial hit and lead compounds have the potential to reduce both the cost and time of early drug discovery stages, as well as to find inhibitors for even challenging target sites such as protein-protein interfaces. Here in this review, we provide an overview of the progress that has been made in virtual screening methodology and technology on multiple fronts in recent years. The advent of ultra-large virtual screens, in which hundreds of millions to billions of compounds are screened, has proven to be a powerful approach to discover highly potent hit compounds. However, these developments are just the tip of the iceberg, with new technologies and methods emerging to propel the field forward. Examples include novel machine-learning approaches, which can reduce the computational costs of virtual screening dramatically, while progress in quantum-mechanical approaches can increase the accuracy of predictions of various small molecule properties.
Collapse
Affiliation(s)
- Christoph Gorgulla
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School (HMS), Boston, MA, USA; Department of Physics, Faculty of Arts and Sciences, Harvard University, Cambridge, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA
| | | | - Konstantin Fackeldey
- Institute of Mathematics, Technical University Berlin, Berlin, Germany; Zuse Institute Berlin, Berlin, Germany
| | - Haribabu Arthanari
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School (HMS), Boston, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA.
| |
Collapse
|
15
|
Rácz A, Mihalovits LM, Bajusz D, Héberger K, Miranda-Quintana RA. Molecular Dynamics Simulations and Diversity Selection by Extended Continuous Similarity Indices. J Chem Inf Model 2022; 62:3415-3425. [PMID: 35834424 PMCID: PMC9326969 DOI: 10.1021/acs.jcim.2c00433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Molecular dynamics (MD) is a core methodology of molecular
modeling
and computational design for the study of the dynamics and temporal
evolution of molecular systems. MD simulations have particularly benefited
from the rapid increase of computational power that has characterized
the past decades of computational chemical research, being the first
method to be successfully migrated to the GPU infrastructure. While
new-generation MD software is capable of delivering simulations on
an ever-increasing scale, relatively less effort is invested in developing
postprocessing methods that can keep up with the quickly expanding
volumes of data that are being generated. Here, we introduce a new
idea for sampling frames from large MD trajectories, based on the
recently introduced framework of extended similarity indices. Our
approach presents a new, linearly scaling alternative to the traditional
approach of applying a clustering algorithm that usually scales as
a quadratic function of the number of frames. When showcasing its
usage on case studies with different system sizes and simulation lengths,
we have registered speedups of up to 2 orders of magnitude, as compared
to traditional clustering algorithms. The conformational diversity
of the selected frames is also noticeably higher, which is a further
advantage for certain applications, such as the selection of structural
ensembles for ligand docking. The method is available open-source
at https://github.com/ramirandaq/MultipleComparisons.
Collapse
Affiliation(s)
- Anita Rácz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117 Budapest, Hungary
| | - Levente M Mihalovits
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117 Budapest, Hungary
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117 Budapest, Hungary
| | - Károly Héberger
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117 Budapest, Hungary
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
16
|
Orosz Á, Héberger K, Rácz A. Comparison of Descriptor- and Fingerprint Sets in Machine Learning Models for ADME-Tox Targets. Front Chem 2022; 10:852893. [PMID: 35755260 PMCID: PMC9214226 DOI: 10.3389/fchem.2022.852893] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 04/14/2022] [Indexed: 01/12/2023] Open
Abstract
The screening of compounds for ADME-Tox targets plays an important role in drug design. QSPR models can increase the speed of these specific tasks, although the performance of the models highly depends on several factors, such as the applied molecular descriptors. In this study, a detailed comparison of the most popular descriptor groups has been carried out for six main ADME-Tox classification targets: Ames mutagenicity, P-glycoprotein inhibition, hERG inhibition, hepatotoxicity, blood–brain-barrier permeability, and cytochrome P450 2C9 inhibition. The literature-based, medium-sized binary classification datasets (all above 1,000 molecules) were used for the model building by two common algorithms, XGBoost and the RPropMLP neural network. Five molecular representation sets were compared along with their joint applications: Morgan, Atompairs, and MACCS fingerprints, and the traditional 1D and 2D molecular descriptors, as well as 3D molecular descriptors, separately. The statistical evaluation of the model performances was based on 18 different performance parameters. Although all the developed models were close to the usual performance of QSPR models for each specific ADME-Tox target, the results clearly showed the superiority of the traditional 1D, 2D, and 3D descriptors in the case of the XGBoost algorithm. It is worth trying the classical tools in single model building because the use of 2D descriptors can produce even better models for almost every dataset than the combination of all the examined descriptor sets.
Collapse
Affiliation(s)
- Álmos Orosz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Budapest, Hungary
| | - Károly Héberger
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Budapest, Hungary
| | - Anita Rácz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
17
|
Bajusz D, Keserű GM. Maximizing the integration of virtual and experimental screening in hit discovery. Expert Opin Drug Discov 2022; 17:629-640. [PMID: 35671403 DOI: 10.1080/17460441.2022.2085685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
INTRODUCTION Experimental and virtual screening contributes to the discovery of more than 50% of clinical candidates. Considering the similar concept and goals, early-phase drug discovery would benefit from the effective integration of these approaches. AREAS COVERED After reviewing the recent trends in both experimental and virtual screening, the authors discuss different integration strategies from parallel, focused, sequential, and iterative screening. Strategic considerations are demonstrated in a number of real-life case studies. EXPERT OPINION Experimental and virtual screening are complementary approaches that should be integrated in lead discovery settings. Virtual screening can access extremely large synthetically feasible chemical space that can be effectively searched on GPU clusters or cloud architectures. Experimental screening provides reliable datasets by quantitative HTS applications, and DNA-encoded libraries (DEL) have enlarged the chemical space covered by these technologies. These developments, together with the use of artificial intelligence methods, represent new options for their efficient integration. The case studies discussed here demonstrate the benefits of complementary strategies, such as focused and iterative screening.
Collapse
Affiliation(s)
- Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Budapest, Hungary
| | - György M Keserű
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
18
|
Janicka M, Śliwińska A. Quantitative Retention (Structure)–Activity Relationships in Predicting the Pharmaceutical and Toxic Properties of Potential Pesticides. Molecules 2022; 27:molecules27113599. [PMID: 35684533 PMCID: PMC9182382 DOI: 10.3390/molecules27113599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 05/30/2022] [Accepted: 05/30/2022] [Indexed: 11/16/2022] Open
Abstract
The micellar liquid chromatography technique and quantitative retention (structure)–activity relationships method were used to predict properties of carbamic and phenoxyacetic acids derivatives, newly synthesized in our laboratory and considered as potential pesticides. Important properties of the test substances characterizing their potential significance as pesticides as well as threats to humans were considered: the volume of distribution, the unbonded fractions, the blood–brain distribution, the rate of skin and cell permeation, the dermal absorption, the binding to human serum albumin, partitioning between water and plants’ cuticles, and the lethal dose. Pharmacokinetic and toxicity parameters were predicted as functions of the solutes’ lipophilicities and the number of hydrogen bond donors, the number of hydrogen bond acceptors, and the number of rotatable bonds. The equations that were derived were evaluated statistically and cross-validated. Important features of the molecular structure influencing the properties of the tested substances were indicated. The QSAR models that were developed had high predictive ability and high reliability in modeling the properties of the molecules that were tested. The investigations highlighted the applicability of combined chromatographic technique and QS(R)ARs in modeling the important properties of potential pesticides and reducing unethical animal testing.
Collapse
Affiliation(s)
- Małgorzata Janicka
- Department of Physical Chemistry, Faculty of Chemistry, Institute of Chemical Science, Maria Curie-Skłodowska University, 20-031 Lublin, Poland
- Correspondence:
| | - Anna Śliwińska
- Doctoral School of Quantitative and Natural Sciences, Maria Curie-Skłodowska University, 20-031 Lublin, Poland;
| |
Collapse
|
19
|
An Explainable Supervised Machine Learning Model for Predicting Respiratory Toxicity of Chemicals Using Optimal Molecular Descriptors. Pharmaceutics 2022; 14:pharmaceutics14040832. [PMID: 35456666 PMCID: PMC9028223 DOI: 10.3390/pharmaceutics14040832] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/30/2022] [Accepted: 04/03/2022] [Indexed: 01/27/2023] Open
Abstract
Respiratory toxicity is a serious public health concern caused by the adverse effects of drugs or chemicals, so the pharmaceutical and chemical industries demand reliable and precise computational tools to assess the respiratory toxicity of compounds. The purpose of this study is to develop quantitative structure-activity relationship models for a large dataset of chemical compounds associated with respiratory system toxicity. First, several feature selection techniques are explored to find the optimal subset of molecular descriptors for efficient modeling. Then, eight different machine learning algorithms are utilized to construct respiratory toxicity prediction models. The support vector machine classifier outperforms all other optimized models in 10-fold cross-validation. Additionally, it outperforms the prior study by 2% in prediction accuracy and 4% in MCC. The best SVM model achieves a prediction accuracy of 86.2% and a MCC of 0.722 on the test set. The proposed SVM model predictions are explained using the SHapley Additive exPlanations approach, which prioritizes the relevance of key modeling descriptors influencing the prediction of respiratory toxicity. Thus, our proposed model would be incredibly beneficial in the early stages of drug development for predicting and understanding potential respiratory toxic compounds.
Collapse
|
20
|
Extended continuous similarity indices: theory and application for QSAR descriptor selection. J Comput Aided Mol Des 2022; 36:157-173. [PMID: 35288838 DOI: 10.1007/s10822-022-00444-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 02/23/2022] [Indexed: 01/10/2023]
Abstract
Extended (or n-ary) similarity indices have been recently proposed to extend the comparative analysis of binary strings. Going beyond the traditional notion of pairwise comparisons, these novel indices allow comparing any number of objects at the same time. This results in a remarkable efficiency gain with respect to other approaches, since now we can compare N molecules in O(N) instead of the common quadratic O(N2) timescale. This favorable scaling has motivated the application of these indices to diversity selection, clustering, phylogenetic analysis, chemical space visualization, and post-processing of molecular dynamics simulations. However, the current formulation of the n-ary indices is limited to vectors with binary or categorical inputs. Here, we present the further generalization of this formalism so it can be applied to numerical data, i.e. to vectors with continuous components. We discuss several ways to achieve this extension and present their analytical properties. As a practical example, we apply this formalism to the problem of feature selection in QSAR and prove that the extended continuous similarity indices provide a convenient way to discern between several sets of descriptors.
Collapse
|
21
|
Dunn TB, Seabra GM, Kim TD, Juárez-Mercado KE, Li C, Medina-Franco JL, Miranda-Quintana RA. Diversity and Chemical Library Networks of Large Data Sets. J Chem Inf Model 2021; 62:2186-2201. [PMID: 34723537 DOI: 10.1021/acs.jcim.1c01013] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The quantification of chemical diversity has many applications in drug discovery, organic chemistry, food, and natural product chemistry, to name a few. As the size of the chemical space is expanding rapidly, it is imperative to develop efficient methods to quantify the diversity of large and ultralarge chemical libraries and visualize their mutual relationships in chemical space. Herein, we show an application of our recently introduced extended similarity indices to measure the fingerprint-based diversity of 19 chemical libraries typically used in drug discovery and natural products research with over 18 million compounds. Based on this concept, we introduce the Chemical Library Networks (CLNs) as a general and efficient framework to represent visually the chemical space of large chemical libraries providing a global perspective of the relation between the libraries. For the 19 compound libraries explored in this work, it was found that the (extended) Tanimoto index offers the best description of extended similarity in combination with RDKit fingerprints. CLNs are general and can be explored with any structure representation and similarity coefficient for large chemical libraries.
Collapse
Affiliation(s)
- Timothy B Dunn
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Gustavo M Seabra
- Department of Medicinal Chemistry, University of Florida, Gainesville, Florida 32610, United States.,Center for Natural Products, Drug Discovery and Development (CNPD3), University of Florida, Gainesville, Florida 32610, United States
| | - Taewon David Kim
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - K Eurídice Juárez-Mercado
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico
| | - Chenglong Li
- Department of Medicinal Chemistry, University of Florida, Gainesville, Florida 32610, United States.,Center for Natural Products, Drug Discovery and Development (CNPD3), University of Florida, Gainesville, Florida 32610, United States
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States.,Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
22
|
Mak KK, Balijepalli MK, Pichika MR. Success stories of AI in drug discovery - where do things stand? Expert Opin Drug Discov 2021; 17:79-92. [PMID: 34553659 DOI: 10.1080/17460441.2022.1985108] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
INTRODUCTION Artificial intelligence (AI) in drug discovery and development (DDD) has gained more traction in the past few years. Many scientific reviews have already been made available in this area. Thus, in this review, the authors have focused on the success stories of AI-driven drug candidates and the scientometric analysis of the literature in this field. AREA COVERED The authors explore the literature to compile the success stories of AI-driven drug candidates that are currently being assessed in clinical trials or have investigational new drug (IND) status. The authors also provide the reader with their expert perspectives for future developments and their opinions on the field. EXPERT OPINION Partnerships between AI companies and the pharma industry are booming. The early signs of the impact of AI on DDD are encouraging, and the pharma industry is hoping for breakthroughs. AI can be a promising technology to unveil the greatest successes, but it has yet to be proven as AI is still at the embryonic stage.
Collapse
Affiliation(s)
- Kit-Kay Mak
- School of Postgraduate Studies and Research, International Medical University, Bukit Jalil, Malaysia.,Department of Pharmaceutical Chemistry, School of Pharmacy, International Medical University, Bukit Jalil, Malaysia.,Centre for Bioactive Molecules and Drug Delivery, Institute for Research, Development, and Innovation (Irdi), International Medical University, Bukit Jalil, Malaysia
| | | | - Mallikarjuna Rao Pichika
- Department of Pharmaceutical Chemistry, School of Pharmacy, International Medical University, Bukit Jalil, Malaysia.,Centre for Bioactive Molecules and Drug Delivery, Institute for Research, Development, and Innovation (Irdi), International Medical University, Bukit Jalil, Malaysia
| |
Collapse
|