1
|
Abbas MKG, Rassam A, Karamshahi F, Abunora R, Abouseada M. The Role of AI in Drug Discovery. Chembiochem 2024; 25:e202300816. [PMID: 38735845 DOI: 10.1002/cbic.202300816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/14/2024]
Abstract
The emergence of Artificial Intelligence (AI) in drug discovery marks a pivotal shift in pharmaceutical research, blending sophisticated computational techniques with conventional scientific exploration to break through enduring obstacles. This review paper elucidates the multifaceted applications of AI across various stages of drug development, highlighting significant advancements and methodologies. It delves into AI's instrumental role in drug design, polypharmacology, chemical synthesis, drug repurposing, and the prediction of drug properties such as toxicity, bioactivity, and physicochemical characteristics. Despite AI's promising advancements, the paper also addresses the challenges and limitations encountered in the field, including data quality, generalizability, computational demands, and ethical considerations. By offering a comprehensive overview of AI's role in drug discovery, this paper underscores the technology's potential to significantly enhance drug development, while also acknowledging the hurdles that must be overcome to fully realize its benefits.
Collapse
Affiliation(s)
- M K G Abbas
- Center for Advanced Materials, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Abrar Rassam
- Secondary Education, Educational Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Fatima Karamshahi
- Department of Chemistry and Earth Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Rehab Abunora
- Faculty of Medicine, General Medicine and Surgery, Helwan University, Cairo, Egypt
| | - Maha Abouseada
- Department of Chemistry and Earth Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| |
Collapse
|
2
|
Thomas M, Ahmad M, Tresadern G, de Fabritiis G. PromptSMILES: prompting for scaffold decoration and fragment linking in chemical language models. J Cheminform 2024; 16:77. [PMID: 38965600 PMCID: PMC11225391 DOI: 10.1186/s13321-024-00866-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 06/04/2024] [Indexed: 07/06/2024] Open
Abstract
SMILES-based generative models are amongst the most robust and successful recent methods used to augment drug design. They are typically used for complete de novo generation, however, scaffold decoration and fragment linking applications are sometimes desirable which requires a different grammar, architecture, training dataset and therefore, re-training of a new model. In this work, we describe a simple procedure to conduct constrained molecule generation with a SMILES-based generative model to extend applicability to scaffold decoration and fragment linking by providing SMILES prompts, without the need for re-training. In combination with reinforcement learning, we show that pre-trained, decoder-only models adapt to these applications quickly and can further optimize molecule generation towards a specified objective. We compare the performance of this approach to a variety of orthogonal approaches and show that performance is comparable or better. For convenience, we provide an easy-to-use python package to facilitate model sampling which can be found on GitHub and the Python Package Index.Scientific contributionThis novel method extends an autoregressive chemical language model to scaffold decoration and fragment linking scenarios. This doesn't require re-training, the use of a bespoke grammar, or curation of a custom dataset, as commonly required by other approaches.
Collapse
Affiliation(s)
- Morgan Thomas
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
| | - Mazen Ahmad
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gianni de Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
- Acellera Labs, C Dr. Trueta 183, 08005, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
3
|
Yang L, Guo Q, Zhang L. AI-assisted chemistry research: a comprehensive analysis of evolutionary paths and hotspots through knowledge graphs. Chem Commun (Camb) 2024; 60:6977-6987. [PMID: 38910536 DOI: 10.1039/d4cc01892c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Artificial intelligence (AI) offers transformative potential for chemical research through its ability to optimize reactions and processes, enhance energy efficiency, and reduce waste. AI-assisted chemical research (AI + chem) has become a global hotspot. To better understand the current research status of "AI + chem", this study conducted a scientific bibliometric investigation using CiteSpace. The web of science core collection was utilized to retrieve original articles related to "AI + chem" published from 2000 to 2024. The obtained data allowed for the visualization of the knowledge background, current research status, and latest knowledge structure of "AI + chem". The "AI + chem" has entered a stage of explosive growth, and the number of papers will maintain long-term high-speed growth. This article systematically analyzes the latest progress in "AI + chem" and objectively predicts future trends, including molecular design, reaction prediction, materials design, drug design, and quantum chemistry. The outcomes of this study will provide readers with a comprehensive understanding of the overall landscape of "AI + chem".
Collapse
Affiliation(s)
- Lin Yang
- School of Intellectual Property, Dalian University of Technology, Dalian 116024, Liaoning, P. R. China
| | - Qingle Guo
- School of Intellectual Property, Dalian University of Technology, Dalian 116024, Liaoning, P. R. China
| | - Lijing Zhang
- School of Chemistry, Dalian University of Technology, Dalian 116024, Liaoning, P. R. China.
| |
Collapse
|
4
|
Smaldone AM, Batista VS. Quantum-to-Classical Neural Network Transfer Learning Applied to Drug Toxicity Prediction. J Chem Theory Comput 2024; 20:4901-4908. [PMID: 38795030 DOI: 10.1021/acs.jctc.4c00432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2024]
Abstract
Toxicity is a roadblock that prevents an inordinate number of drugs from being used in potentially life-saving applications. Deep learning provides a promising solution to finding ideal drug candidates; however, the vastness of chemical space coupled with the underlying O ( n 3 ) matrix multiplication means these efforts quickly become computationally demanding. To remedy this, we present a hybrid quantum-classical neural network for predicting drug toxicity utilizing a quantum circuit design that mimics classical neural behavior by explicitly calculating matrix products with complexity O ( n 2 ) . Leveraging the Hadamard test for efficient inner product estimation rather than the conventionally used swap test, we reduce the number of qubits by half and remove the need for quantum phase estimation. Directly computing matrix products quantum mechanically allows for learnable weights to be transferred from a quantum to a classical device for further training. We apply our framework to the Tox21 data set and show that it achieves commensurate predictive accuracy to the model's fully classical O ( n 3 ) analogue. Additionally, we demonstrate that the model continues to learn, without disruption, once transferred to a fully classical architecture. We believe that combining the quantum advantage of reduced complexity and the classical advantage of noise-free calculation will pave the way for more scalable machine learning models.
Collapse
Affiliation(s)
- Anthony M Smaldone
- Department of Chemistry, Yale University, New Haven 06511, Connecticut, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven 06511, Connecticut, United States
| |
Collapse
|
5
|
Nayarisseri A, Abdalla M, Joshi I, Yadav M, Bhrdwaj A, Chopra I, Khan A, Saxena A, Sharma K, Panicker A, Panwar U, Mendonça Junior FJB, Singh SK. Potential inhibitors of VEGFR1, VEGFR2, and VEGFR3 developed through Deep Learning for the treatment of Cervical Cancer. Sci Rep 2024; 14:13251. [PMID: 38858458 PMCID: PMC11164920 DOI: 10.1038/s41598-024-63762-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 05/31/2024] [Indexed: 06/12/2024] Open
Abstract
Cervical cancer stands as a prevalent gynaecologic malignancy affecting women globally, often linked to persistent human papillomavirus infection. Biomarkers associated with cervical cancer, including VEGF-A, VEGF-B, VEGF-C, VEGF-D, and VEGF-E, show upregulation and are linked to angiogenesis and lymphangiogenesis. This research aims to employ in-silico methods to target tyrosine kinase receptor proteins-VEGFR-1, VEGFR-2, and VEGFR-3, and identify novel inhibitors for Vascular Endothelial Growth Factors receptors (VEGFRs). A comprehensive literary study was conducted which identified 26 established inhibitors for VEGFR-1, VEGFR-2, and VEGFR-3 receptor proteins. Compounds with high-affinity scores, including PubChem ID-25102847, 369976, and 208908 were chosen from pre-existing compounds for creating Deep Learning-based models. RD-Kit, a Deep learning algorithm, was used to generate 43 million compounds for VEGFR-1, VEGFR-2, and VEGFR-3 targets. Molecular docking studies were conducted on the top 10 molecules for each target to validate the receptor-ligand binding affinity. The results of Molecular Docking indicated that PubChem IDs-71465,645 and 11152946 exhibited strong affinity, designating them as the most efficient molecules. To further investigate their potential, a Molecular Dynamics Simulation was performed to assess conformational stability, and a pharmacophore analysis was also conducted for indoctrinating interactions.
Collapse
Affiliation(s)
- Anuraj Nayarisseri
- In silico Research Laboratory, Eminent Biosciences, 91, Sector-A, Mahalakshmi Nagar, Indore, Madhya Pradesh, 452010, India.
- Bioinformatics Research Laboratory, LeGene Biosciences Pvt Ltd, 91, Sector-A, Mahalakshmi Nagar, Indore, Madhya Pradesh, 452010, India.
| | - Mohnad Abdalla
- Key Laboratory of Chemical Biology (Ministry of Education), Department of Pharmaceutics, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, 44 Cultural West Road, Jinan, 250012, Shandong Province, People's Republic of China
| | - Isha Joshi
- In silico Research Laboratory, Eminent Biosciences, 91, Sector-A, Mahalakshmi Nagar, Indore, Madhya Pradesh, 452010, India
| | - Manasi Yadav
- In silico Research Laboratory, Eminent Biosciences, 91, Sector-A, Mahalakshmi Nagar, Indore, Madhya Pradesh, 452010, India
| | - Anushka Bhrdwaj
- In silico Research Laboratory, Eminent Biosciences, 91, Sector-A, Mahalakshmi Nagar, Indore, Madhya Pradesh, 452010, India
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi, Tamil Nadu, 630003, India
| | - Ishita Chopra
- In silico Research Laboratory, Eminent Biosciences, 91, Sector-A, Mahalakshmi Nagar, Indore, Madhya Pradesh, 452010, India
- School of Medicine and Health Sciences, The George Washington University, Ross Hall, 2300 Eye Street, Washington, D.C., NW, 20037, USA
| | - Arshiya Khan
- In silico Research Laboratory, Eminent Biosciences, 91, Sector-A, Mahalakshmi Nagar, Indore, Madhya Pradesh, 452010, India
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi, Tamil Nadu, 630003, India
| | - Arshiya Saxena
- In silico Research Laboratory, Eminent Biosciences, 91, Sector-A, Mahalakshmi Nagar, Indore, Madhya Pradesh, 452010, India
| | - Khushboo Sharma
- In silico Research Laboratory, Eminent Biosciences, 91, Sector-A, Mahalakshmi Nagar, Indore, Madhya Pradesh, 452010, India
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi, Tamil Nadu, 630003, India
| | - Aravind Panicker
- In silico Research Laboratory, Eminent Biosciences, 91, Sector-A, Mahalakshmi Nagar, Indore, Madhya Pradesh, 452010, India
| | - Umesh Panwar
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi, Tamil Nadu, 630003, India
| | | | - Sanjeev Kumar Singh
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi, Tamil Nadu, 630003, India.
| |
Collapse
|
6
|
Alberga D, Lamanna G, Graziano G, Delre P, Lomuscio MC, Corriero N, Ligresti A, Siliqi D, Saviano M, Contino M, Stefanachi A, Mangiatordi GF. DeLA-DrugSelf: Empowering multi-objective de novo design through SELFIES molecular representation. Comput Biol Med 2024; 175:108486. [PMID: 38653065 DOI: 10.1016/j.compbiomed.2024.108486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/08/2024] [Accepted: 04/15/2024] [Indexed: 04/25/2024]
Abstract
In this paper, we introduce DeLA-DrugSelf, an upgraded version of DeLA-Drug [J. Chem. Inf. Model. 62 (2022) 1411-1424], which incorporates essential advancements for automated multi-objective de novo design. Unlike its predecessor, which relies on SMILES notation for molecular representation, DeLA-DrugSelf employs a novel and robust molecular representation string named SELFIES (SELF-referencing Embedded String). The generation process in DeLA-DrugSelf not only involves substitutions to the initial string representing the starting query molecule but also incorporates insertions and deletions. This enhancement makes DeLA-DrugSelf significantly more adept at executing data-driven scaffold decoration and lead optimization strategies. Remarkably, DeLA-DrugSelf explicitly addresses the SELFIES-related collapse issue, considering only collapse-free compounds during generation. These compounds undergo a rigorous quality metrics evaluation, highlighting substantial advancements in terms of drug-likeness, uniqueness, and novelty compared to the molecules generated by the previous version of the algorithm. To evaluate the potential of DeLA-DrugSelf as a mutational operator within a genetic algorithm framework for multi-objective optimization, we employed a fitness function based on Pareto dominance. Our objectives focused on target-oriented properties aimed at optimizing known cannabinoid receptor 2 (CB2R) ligands. The results obtained indicate that DeLA-DrugSelf, available as a user-friendly web platform (https://www.ba.ic.cnr.it/softwareic/delaself/), can effectively contribute to the data-driven optimization of starting bioactive molecules based on user-defined parameters.
Collapse
Affiliation(s)
- Domenico Alberga
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Giuseppe Lamanna
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Giovanni Graziano
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | - Pietro Delre
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | | | - Nicola Corriero
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Alessia Ligresti
- CNR - Institute of Biomolecular Chemistry, Via Campi Flegrei 34, 80078, Pozzuoli, Italy
| | - Dritan Siliqi
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Michele Saviano
- CNR - Institute of Crystallography, Via Vivaldi 43, 81100, Caserta, Italy
| | - Marialessandra Contino
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | - Angela Stefanachi
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | | |
Collapse
|
7
|
Ventura CAI, Denton EE, David JA. Artificial Intelligence in Emergency Trauma Care: A Preliminary Scoping Review. MEDICAL DEVICES-EVIDENCE AND RESEARCH 2024; 17:191-211. [PMID: 38803707 PMCID: PMC11129754 DOI: 10.2147/mder.s467146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 05/17/2024] [Indexed: 05/29/2024] Open
Abstract
This study aimed to analyze the use of generative artificial intelligence in the emergency trauma care setting through a brief scoping review of literature published between 2014 and 2024. An exploration of the NCBI repository was performed using a search string of selected keywords that returned N=87 results; articles that met the inclusion criteria (n=28) were reviewed and analyzed. Heterogeneity sources were explored and identified by a significance threshold of P < 0.10 or an I2 value exceeding 50%. If applicable, articles were categorized within three primary domains: triage, diagnostics, or treatment. Findings suggest that CNNs demonstrate strong diagnostic performance for diverse traumatic injuries, but generalized integration requires expanded prospective multi-center validation. Injury scoring models currently experience calibration gaps in mortality quantification and lesion localization that can undermine clinical utility by permitting false negatives. Triage predictive models now confront transparency, explainability, and healthcare ecosystem integration barriers limiting real-world translation. The most significant literature gap centers on treatment-oriented generative AI applications that provide real-time guidance for urgent trauma interventions rather than just analytical support.
Collapse
Affiliation(s)
- Christian Angelo I Ventura
- Department of Health, Behavior and Society, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA; Department of Allied Health, Baltimore City Community College, Baltimore, MD, USA
| | - Edward E Denton
- Department of Emergency Medicine, University of Arkansas for Medical Sciences, Little Rock, AR USA; Fay W. Boozman College of Public Health, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Jessica A David
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA
| |
Collapse
|
8
|
Mao J, Wang J, Zeb A, Cho KH, Jin H, Kim J, Lee O, Wang Y, No KT. Transformer-Based Molecular Generative Model for Antiviral Drug Design. J Chem Inf Model 2024; 64:2733-2745. [PMID: 37366644 PMCID: PMC11005037 DOI: 10.1021/acs.jcim.3c00536] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Indexed: 06/28/2023]
Abstract
Since the Simplified Molecular Input Line Entry System (SMILES) is oriented to the atomic-level representation of molecules and is not friendly in terms of human readability and editable, however, IUPAC is the closest to natural language and is very friendly in terms of human-oriented readability and performing molecular editing, we can manipulate IUPAC to generate corresponding new molecules and produce programming-friendly molecular forms of SMILES. In addition, antiviral drug design, especially analogue-based drug design, is also more appropriate to edit and design directly from the functional group level of IUPAC than from the atomic level of SMILES, since designing analogues involves altering the R group only, which is closer to the knowledge-based molecular design of a chemist. Herein, we present a novel data-driven self-supervised pretraining generative model called "TransAntivirus" to make select-and-replace edits and convert organic molecules into the desired properties for design of antiviral candidate analogues. The results indicated that TransAntivirus is significantly superior to the control models in terms of novelty, validity, uniqueness, and diversity. TransAntivirus showed excellent performance in the design and optimization of nucleoside and non-nucleoside analogues by chemical space analysis and property prediction analysis. Furthermore, to validate the applicability of TransAntivirus in the design of antiviral drugs, we conducted two case studies on the design of nucleoside analogues and non-nucleoside analogues and screened four candidate lead compounds against anticoronavirus disease (COVID-19). Finally, we recommend this framework for accelerating antiviral drug discovery.
Collapse
Affiliation(s)
- Jiashun Mao
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Jianmin Wang
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Amir Zeb
- Faculty
of Natural and Basic Sciences, University
of Turbat, Balochistan 92600, Pakistan
| | - Kwang-Hwi Cho
- School
of Systems Biomedical Science, Soongsil
University, Seoul 06978, Republic of Korea
| | - Haiyan Jin
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Jongwan Kim
- Department
of Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
- Bioinformatics
and Molecular Design Research Center (BMDRC), Incheon 21983, Republic of Korea
| | - Onju Lee
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Yunyun Wang
- School
of Pharmacy and Jiangsu Province Key Laboratory for Inflammation and
Molecular Drug Target, Nantong University, Nantong 226001, Jiangsu, P. R. China
| | - Kyoung Tai No
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| |
Collapse
|
9
|
Chen Y, Esmaeilzadeh P. Generative AI in Medical Practice: In-Depth Exploration of Privacy and Security Challenges. J Med Internet Res 2024; 26:e53008. [PMID: 38457208 PMCID: PMC10960211 DOI: 10.2196/53008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 12/12/2023] [Accepted: 01/31/2024] [Indexed: 03/09/2024] Open
Abstract
As advances in artificial intelligence (AI) continue to transform and revolutionize the field of medicine, understanding the potential uses of generative AI in health care becomes increasingly important. Generative AI, including models such as generative adversarial networks and large language models, shows promise in transforming medical diagnostics, research, treatment planning, and patient care. However, these data-intensive systems pose new threats to protected health information. This Viewpoint paper aims to explore various categories of generative AI in health care, including medical diagnostics, drug discovery, virtual health assistants, medical research, and clinical decision support, while identifying security and privacy threats within each phase of the life cycle of such systems (ie, data collection, model development, and implementation phases). The objectives of this study were to analyze the current state of generative AI in health care, identify opportunities and privacy and security challenges posed by integrating these technologies into existing health care infrastructure, and propose strategies for mitigating security and privacy risks. This study highlights the importance of addressing the security and privacy threats associated with generative AI in health care to ensure the safe and effective use of these systems. The findings of this study can inform the development of future generative AI systems in health care and help health care organizations better understand the potential benefits and risks associated with these systems. By examining the use cases and benefits of generative AI across diverse domains within health care, this paper contributes to theoretical discussions surrounding AI ethics, security vulnerabilities, and data privacy regulations. In addition, this study provides practical insights for stakeholders looking to adopt generative AI solutions within their organizations.
Collapse
Affiliation(s)
- Yan Chen
- Department of Information Systems and Business Analytics, College of Business, Florida International University, Miami, FL, United States
| | - Pouyan Esmaeilzadeh
- Department of Information Systems and Business Analytics, College of Business, Florida International University, Miami, FL, United States
| |
Collapse
|
10
|
Jahan I, Laskar MTR, Peng C, Huang JX. A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks. Comput Biol Med 2024; 171:108189. [PMID: 38447502 DOI: 10.1016/j.compbiomed.2024.108189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/14/2024] [Accepted: 02/18/2024] [Indexed: 03/08/2024]
Abstract
Recently, Large Language Models (LLMs) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets has been conducted. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art models when they were fine-tuned only on the training set of these datasets. This suggests that pre-training on large text corpora makes LLMs quite specialized even in the biomedical domain. We also find that not a single LLM can outperform other LLMs in all tasks, with the performance of different LLMs may vary depending on the task. While their performance is still quite poor in comparison to the biomedical models that were fine-tuned on large training sets, our findings demonstrate that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.
Collapse
Affiliation(s)
- Israt Jahan
- Department of Biology, York University, Canada; Information Retrieval and Knowledge Management Research Lab, York University, Canada.
| | - Md Tahmid Rahman Laskar
- School of Information Technology, York University, Canada; Information Retrieval and Knowledge Management Research Lab, York University, Canada; Dialpad Inc., Canada.
| | - Chun Peng
- Department of Biology, York University, Canada.
| | - Jimmy Xiangji Huang
- School of Information Technology, York University, Canada; Information Retrieval and Knowledge Management Research Lab, York University, Canada.
| |
Collapse
|
11
|
Wen N, Liu Z, Wang W, Wang S. Feedback linearization control for uncertain nonlinear systems via generative adversarial networks. ISA TRANSACTIONS 2024; 146:555-566. [PMID: 38172034 DOI: 10.1016/j.isatra.2023.12.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 09/30/2023] [Accepted: 12/22/2023] [Indexed: 01/05/2024]
Abstract
This article presents a novel approach to leverage generative adversarial networks(GANs) techniques to learn a feedback linearization controller(FLC) for a class of uncertain nonlinear systems. By estimating uncertainty through the adversarial process, where ground truth samples are exclusively obtained from a predefined integral model, the feedback linearization controller, learned through a minimax two-player optimization framework, enhances the reference tracking performance of the input-output uncertain nonlinear system. Furthermore, we provide theoretical guarantee of convergence and stability, demonstrating the safe recovery of robust FLC. We also address the common challenge of mode collapse in GANs training through the strict convexity of our synthesized generator structure and an enhanced adversarial loss. Comprehensive simulations and practical experiments are conducted to underscore the superiority and efficacy of our proposed approach.
Collapse
Affiliation(s)
- Nuan Wen
- School of Automation Science and Electrical Engineering, Beihang University, 37 XueYuan Road, Haidian District, Beijing 100191, China
| | - Zhenghua Liu
- School of Automation Science and Electrical Engineering, Beihang University, 37 XueYuan Road, Haidian District, Beijing 100191, China.
| | - Weihong Wang
- School of Automation Science and Electrical Engineering, Beihang University, 37 XueYuan Road, Haidian District, Beijing 100191, China
| | - Shaoping Wang
- School of Automation Science and Electrical Engineering, Beihang University, 37 XueYuan Road, Haidian District, Beijing 100191, China
| |
Collapse
|
12
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Front Pharmacol 2024; 15:1331062. [PMID: 38384298 PMCID: PMC10879372 DOI: 10.3389/fphar.2024.1331062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/17/2024] [Indexed: 02/23/2024] Open
Abstract
There are two main ways to discover or design small drug molecules. The first involves fine-tuning existing molecules or commercially successful drugs through quantitative structure-activity relationships and virtual screening. The second approach involves generating new molecules through de novo drug design or inverse quantitative structure-activity relationship. Both methods aim to get a drug molecule with the best pharmacokinetic and pharmacodynamic profiles. However, bringing a new drug to market is an expensive and time-consuming endeavor, with the average cost being estimated at around $2.5 billion. One of the biggest challenges is screening the vast number of potential drug candidates to find one that is both safe and effective. The development of artificial intelligence in recent years has been phenomenal, ushering in a revolution in many fields. The field of pharmaceutical sciences has also significantly benefited from multiple applications of artificial intelligence, especially drug discovery projects. Artificial intelligence models are finding use in molecular property prediction, molecule generation, virtual screening, synthesis planning, repurposing, among others. Lately, generative artificial intelligence has gained popularity across domains for its ability to generate entirely new data, such as images, sentences, audios, videos, novel chemical molecules, etc. Generative artificial intelligence has also delivered promising results in drug discovery and development. This review article delves into the fundamentals and framework of various generative artificial intelligence models in the context of drug discovery via de novo drug design approach. Various basic and advanced models have been discussed, along with their recent applications. The review also explores recent examples and advances in the generative artificial intelligence approach, as well as the challenges and ongoing efforts to fully harness the potential of generative artificial intelligence in generating novel drug molecules in a faster and more affordable manner. Some clinical-level assets generated form generative artificial intelligence have also been discussed in this review to show the ever-increasing application of artificial intelligence in drug discovery through commercial partnerships.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Azim Ansari
- Computer Aided Drug Design Center Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Dhule, India
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, Malaysia
| | - Vinoth Kumarasamy
- Department of Parasitology and Medical Entomology, Faculty of Medicine, Universiti Kebangsaan Malaysia, Cheras, Malaysia
| | - Vetriselvan Subramaniyan
- Pharmacology Unit, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Selangor, Malaysia
- School of Bioengineering and Biosciences, Lovely Professional University, Phagwara, Punjab, India
| | - Ling Shing Wong
- Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia
| |
Collapse
|
13
|
Ang D, Rakovski C, Atamian HS. De Novo Drug Design Using Transformer-Based Machine Translation and Reinforcement Learning of an Adaptive Monte Carlo Tree Search. Pharmaceuticals (Basel) 2024; 17:161. [PMID: 38399376 PMCID: PMC10892138 DOI: 10.3390/ph17020161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/24/2024] [Accepted: 01/25/2024] [Indexed: 02/25/2024] Open
Abstract
The discovery of novel therapeutic compounds through de novo drug design represents a critical challenge in the field of pharmaceutical research. Traditional drug discovery approaches are often resource intensive and time consuming, leading researchers to explore innovative methods that harness the power of deep learning and reinforcement learning techniques. Here, we introduce a novel drug design approach called drugAI that leverages the Encoder-Decoder Transformer architecture in tandem with Reinforcement Learning via a Monte Carlo Tree Search (RL-MCTS) to expedite the process of drug discovery while ensuring the production of valid small molecules with drug-like characteristics and strong binding affinities towards their targets. We successfully integrated the Encoder-Decoder Transformer architecture, which generates molecular structures (drugs) from scratch with the RL-MCTS, serving as a reinforcement learning framework. The RL-MCTS combines the exploitation and exploration capabilities of a Monte Carlo Tree Search with the machine translation of a transformer-based Encoder-Decoder model. This dynamic approach allows the model to iteratively refine its drug candidate generation process, ensuring that the generated molecules adhere to essential physicochemical and biological constraints and effectively bind to their targets. The results from drugAI showcase the effectiveness of the proposed approach across various benchmark datasets, demonstrating a significant improvement in both the validity and drug-likeness of the generated compounds, compared to two existing benchmark methods. Moreover, drugAI ensures that the generated molecules exhibit strong binding affinities to their respective targets. In summary, this research highlights the real-world applications of drugAI in drug discovery pipelines, potentially accelerating the identification of promising drug candidates for a wide range of diseases.
Collapse
Affiliation(s)
- Dony Ang
- Computational and Data Sciences Program, Chapman University, Orange, CA 92866, USA; (D.A.); (C.R.)
- Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA
| | - Cyril Rakovski
- Computational and Data Sciences Program, Chapman University, Orange, CA 92866, USA; (D.A.); (C.R.)
- Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA
| | - Hagop S. Atamian
- Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA
- Biological Sciences Program, Chapman University, Orange, CA 92866, USA
| |
Collapse
|
14
|
Mullin M, McClory J, Haynes W, Grace J, Robertson N, van Heeke G. Applications and challenges in designing VHH-based bispecific antibodies: leveraging machine learning solutions. MAbs 2024; 16:2341443. [PMID: 38666503 PMCID: PMC11057648 DOI: 10.1080/19420862.2024.2341443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 04/05/2024] [Indexed: 05/01/2024] Open
Abstract
The development of bispecific antibodies that bind at least two different targets relies on bringing together multiple binding domains with different binding properties and biophysical characteristics to produce a drug-like therapeutic. These building blocks play an important role in the overall quality of the molecule and can influence many important aspects from potency and specificity to stability and half-life. Single-domain antibodies, particularly camelid-derived variable heavy domain of heavy chain (VHH) antibodies, are becoming an increasingly popular choice for bispecific construction due to their single-domain modularity, favorable biophysical properties, and potential to work in multiple antibody formats. Here, we review the use of VHH domains as building blocks in the construction of multispecific antibodies and the challenges in creating optimized molecules. In addition to exploring traditional approaches to VHH development, we review the integration of machine learning techniques at various stages of the process. Specifically, the utilization of machine learning for structural prediction, lead identification, lead optimization, and humanization of VHH antibodies.
Collapse
|
15
|
Xiong Y, Wang Y, Wang Y, Li C, Yusong P, Wu J, Wang Y, Gu L, Butch CJ. Improving drug discovery with a hybrid deep generative model using reinforcement learning trained on a Bayesian docking approximation. J Comput Aided Mol Des 2023; 37:507-517. [PMID: 37550462 DOI: 10.1007/s10822-023-00523-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 07/17/2023] [Indexed: 08/09/2023]
Abstract
Generative approaches to molecular design are an area of intense study in recent years as a method to generate new pharmaceuticals with desired properties. Often though, these types of efforts are constrained by limited experimental activity data, resulting in either models that generate molecules with poor performance or models that are overfit and produce close analogs of known molecules. In this paper, we reduce this data dependency for the generation of new chemotypes by incorporating docking scores of known and de novo molecules to expand the applicability domain of the reward function and diversify the compounds generated during reinforcement learning. Our approach employs a deep generative model initially trained using a combination of limited known drug activity and an approximate docking score provided by a second machine learned Bayes regression model, with final evaluation of high scoring compounds by a full docking simulation. This strategy results in molecules with docking scores improved by 10-20% compared to molecules of similar size, while being 130 × faster than a docking only approach on a typical GPU workstation. We also show that the increased docking scores correlate with (1) docking poses with interactions similar to known inhibitors and (2) result in higher MM-GBSA binding energies comparable to the energies of known DDR1 inhibitors, demonstrating that the Bayesian model contains sufficient information for the network to learn to efficiently interact with the binding pocket during reinforcement learning. This outcome shows that the combination of the learned latent molecular representation along with the feature-based docking regression is sufficient for reinforcement learning to infer the relationship between the molecules and the receptor binding site, which suggest that our method can be a powerful tool for the discovery of new chemotypes with potential therapeutic applications.
Collapse
Affiliation(s)
- Youjin Xiong
- Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China
| | - Yiqing Wang
- Icekredit Incorporated, Shanghai, 200120, China
| | - Yisheng Wang
- Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China
| | - Chenmei Li
- Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China
| | - Peng Yusong
- Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China
| | - Junyu Wu
- Icekredit Incorporated, Shanghai, 200120, China
| | - Yiqing Wang
- Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China
| | - Lingyun Gu
- Department of Information Systems Technology and Design, Singapore University of Technology and Design, Singapore, Singapore.
| | - Christopher J Butch
- Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China.
| |
Collapse
|
16
|
Liu N, Jin H, Zhang L, Liu Z. Plug-in Models: A Promising Direction for Molecular Generation. HEALTH DATA SCIENCE 2023; 3:0092. [PMID: 38487202 PMCID: PMC10880158 DOI: 10.34133/hds.0092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 09/26/2023] [Indexed: 03/17/2024]
Affiliation(s)
- Ningfeng Liu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Hongwei Jin
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| |
Collapse
|
17
|
Habiballah S, Reisfeld B. Adapting physiologically-based pharmacokinetic models for machine learning applications. Sci Rep 2023; 13:14934. [PMID: 37696914 PMCID: PMC10495394 DOI: 10.1038/s41598-023-42165-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 09/06/2023] [Indexed: 09/13/2023] Open
Abstract
Both machine learning and physiologically-based pharmacokinetic models are becoming essential components of the drug development process. Integrating the predictive capabilities of physiologically-based pharmacokinetic (PBPK) models within machine learning (ML) pipelines could offer significant benefits in improving the accuracy and scope of drug screening and evaluation procedures. Here, we describe the development and testing of a self-contained machine learning module capable of faithfully recapitulating summary pharmacokinetic (PK) parameters produced by a full PBPK model, given a set of input drug-specific and regimen-specific information. Because of its widespread use in characterizing the disposition of orally administered drugs, the PBPK model chosen to demonstrate the methodology was an open-source implementation of a state-of-the-art compartmental and transit model called OpenCAT. The model was tested for drug formulations spanning a large range of solubility and absorption characteristics, and was evaluated for concordance against predictions of OpenCAT and relevant experimental data. In general, the values predicted by the ML models were within 20% of those of the PBPK model across the range of drug and formulation properties. However, summary PK parameter predictions from both the ML model and full PBPK model were occasionally poor with respect to those derived from experiments, suggesting deficiencies in the underlying PBPK model.
Collapse
Affiliation(s)
- Sohaib Habiballah
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, 80523-1301, USA
| | - Brad Reisfeld
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, 80523-1301, USA.
- School of Public Health, Colorado State University, Fort Collins, CO, 80523-1612, USA.
| |
Collapse
|
18
|
Tay DWP, Yeo NZX, Adaikkappan K, Lim YH, Ang SJ. 67 million natural product-like compound database generated via molecular language processing. Sci Data 2023; 10:296. [PMID: 37208372 DOI: 10.1038/s41597-023-02207-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 04/21/2023] [Indexed: 05/21/2023] Open
Abstract
Natural products are a rich resource of bioactive compounds for valuable applications across multiple fields such as food, agriculture, and medicine. For natural product discovery, high throughput in silico screening offers a cost-effective alternative to traditional resource-heavy assay-guided exploration of structurally novel chemical space. In this data descriptor, we report a characterized database of 67,064,204 natural product-like molecules generated using a recurrent neural network trained on known natural products, demonstrating a significant 165-fold expansion in library size over the approximately 400,000 known natural products. This study highlights the potential of using deep generative models to explore novel natural product chemical space for high throughput in silico discovery.
Collapse
Affiliation(s)
- Dillon W P Tay
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore.
| | - Naythan Z X Yeo
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore
- Hwa Chong Institution, 661 Bukit Timah Road, Singapore, 269734, Republic of Singapore
| | - Krishnan Adaikkappan
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore
- National Junior College, 37 Hillcrest Road, Singapore, 288913, Republic of Singapore
| | - Yee Hwee Lim
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore
- Synthetic Biology Translational Research Program, Yong Loo Lin School of Medicine, National University of Singapore, 10 Medical Drive, Singapore, 117597, Republic of Singapore
| | - Shi Jun Ang
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore.
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore, 138632, Republic of Singapore.
| |
Collapse
|
19
|
Kao PY, Yang YC, Chiang WY, Hsiao JY, Cao Y, Aliper A, Ren F, Aspuru-Guzik A, Zhavoronkov A, Hsieh MH, Lin YC. Exploring the Advantages of Quantum Generative Adversarial Networks in Generative Chemistry. J Chem Inf Model 2023. [PMID: 37171372 DOI: 10.1021/acs.jcim.3c00562] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
De novo drug design with desired biological activities is crucial for developing novel therapeutics for patients. The drug development process is time- and resource-consuming, and it has a low probability of success. Recent advances in machine learning and deep learning technology have reduced the time and cost of the discovery process and therefore, improved pharmaceutical research and development. In this paper, we explore the combination of two rapidly developing fields with lead candidate discovery in the drug development process. First, artificial intelligence has already been demonstrated to successfully accelerate conventional drug design approaches. Second, quantum computing has demonstrated promising potential in different applications, such as quantum chemistry, combinatorial optimizations, and machine learning. This article explores hybrid quantum-classical generative adversarial networks (GAN) for small molecule discovery. We substituted each element of GAN with a variational quantum circuit (VQC) and demonstrated the quantum advantages in the small drug discovery. Utilizing a VQC in the noise generator of a GAN to generate small molecules achieves better physicochemical properties and performance in the goal-directed benchmark than the classical counterpart. Moreover, we demonstrate the potential of a VQC with only tens of learnable parameters in the generator of GAN to generate small molecules. We also demonstrate the quantum advantage of a VQC in the discriminator of GAN. In this hybrid model, the number of learnable parameters is significantly less than the classical ones, and it can still generate valid molecules. The hybrid model with only tens of training parameters in the quantum discriminator outperforms the MLP-based one in terms of both generated molecule properties and the achieved KL divergence. However, the hybrid quantum-classical GANs still face challenges in generating unique and valid molecules compared to their classical counterparts.
Collapse
Affiliation(s)
- Po-Yu Kao
- Insilico Medicine Taiwan Ltd., Taipei 110208, Taiwan
| | - Ya-Chu Yang
- Insilico Medicine Taiwan Ltd., Taipei 110208, Taiwan
| | - Wei-Yin Chiang
- Hon Hai (Foxconn) Research Institute, Taipei 114699, Taiwan
| | - Jen-Yueh Hsiao
- Hon Hai (Foxconn) Research Institute, Taipei 114699, Taiwan
| | - Yudong Cao
- Zapata Computing, Inc., Boston, Massachusetts 02110, United States
| | - Alex Aliper
- Insilico Medicine AI Limited, Masdar City, Abu Dhabi 145748, UAE
| | - Feng Ren
- Insilico Medicine Shanghai Ltd., Shanghai 201203, China
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, ON M5S 3H6, Canada
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research, Toronto, ON M5S 1M1, Canada
| | | | - Min-Hsiu Hsieh
- Hon Hai (Foxconn) Research Institute, Taipei 114699, Taiwan
| | - Yen-Chu Lin
- Insilico Medicine Taiwan Ltd., Taipei 110208, Taiwan
- Department of Pharmacy, National Yang Ming Chiao Tung University, Taipei 112304, Taiwan
| |
Collapse
|
20
|
Nemoto S, Mizuno T, Kusuhara H. Investigation of chemical structure recognition by encoder-decoder models in learning progress. J Cheminform 2023; 15:45. [PMID: 37046349 PMCID: PMC10100163 DOI: 10.1186/s13321-023-00713-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 03/18/2023] [Indexed: 04/14/2023] Open
Abstract
Descriptor generation methods using latent representations of encoder-decoder (ED) models with SMILES as input are useful because of the continuity of descriptor and restorability to the structure. However, it is not clear how the structure is recognized in the learning progress of ED models. In this work, we created ED models of various learning progress and investigated the relationship between structural information and learning progress. We showed that compound substructures were learned early in ED models by monitoring the accuracy of downstream tasks and input-output substructure similarity using substructure-based descriptors, which suggests that existing evaluation methods based on the accuracy of downstream tasks may not be sensitive enough to evaluate the performance of ED models with SMILES as descriptor generation methods. On the other hand, we showed that structure restoration was time-consuming, and in particular, insufficient learning led to the estimation of a larger structure than the actual one. It can be inferred that determining the endpoint of the structure is a difficult task for the model. To our knowledge, this is the first study to link the learning progress of SMILES by ED model to chemical structures for a wide range of chemicals.
Collapse
Affiliation(s)
- Shumpei Nemoto
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| | - Tadahaya Mizuno
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan.
| | - Hiroyuki Kusuhara
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| |
Collapse
|
21
|
Wang N, Zhang Y, Wang W, Ye Z, Chen H, Hu G, Ouyang D. How can machine learning and multiscale modeling benefit ocular drug development? Adv Drug Deliv Rev 2023; 196:114772. [PMID: 36906232 DOI: 10.1016/j.addr.2023.114772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 02/06/2023] [Accepted: 03/05/2023] [Indexed: 03/12/2023]
Abstract
The eyes possess sophisticated physiological structures, diverse disease targets, limited drug delivery space, distinctive barriers, and complicated biomechanical processes, requiring a more in-depth understanding of the interactions between drug delivery systems and biological systems for ocular formulation development. However, the tiny size of the eyes makes sampling difficult and invasive studies costly and ethically constrained. Developing ocular formulations following conventional trial-and-error formulation and manufacturing process screening procedures is inefficient. Along with the popularity of computational pharmaceutics, non-invasive in silico modeling & simulation offer new opportunities for the paradigm shift of ocular formulation development. The current work first systematically reviews the theoretical underpinnings, advanced applications, and unique advantages of data-driven machine learning and multiscale simulation approaches represented by molecular simulation, mathematical modeling, and pharmacokinetic (PK)/pharmacodynamic (PD) modeling for ocular drug development. Following this, a new computer-driven framework for rational pharmaceutical formulation design is proposed, inspired by the potential of in silico explorations in understanding drug delivery details and facilitating drug formulation design. Lastly, to promote the paradigm shift, integrated in silico methodologies were highlighted, and discussions on data challenges, model practicality, personalized modeling, regulatory science, interdisciplinary collaboration, and talent training were conducted in detail with a view to achieving more efficient objective-oriented pharmaceutical formulation design.
Collapse
Affiliation(s)
- Nannan Wang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Yunsen Zhang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Wei Wang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Zhuyifan Ye
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Hongyu Chen
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China; Faculty of Science and Technology (FST), University of Macau, Macau, China
| | - Guanghui Hu
- Faculty of Science and Technology (FST), University of Macau, Macau, China
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China; Department of Public Health and Medicinal Administration, Faculty of Health Sciences (FHS), University of Macau, Macau, China.
| |
Collapse
|
22
|
Schoenmaker L, Béquignon OJM, Jespers W, van Westen GJP. UnCorrupt SMILES: a novel approach to de novo design. J Cheminform 2023; 15:22. [PMID: 36788579 PMCID: PMC9926805 DOI: 10.1186/s13321-023-00696-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 02/06/2023] [Indexed: 02/16/2023] Open
Abstract
Generative deep learning models have emerged as a powerful approach for de novo drug design as they aid researchers in finding new molecules with desired properties. Despite continuous improvements in the field, a subset of the outputs that sequence-based de novo generators produce cannot be progressed due to errors. Here, we propose to fix these invalid outputs post hoc. In similar tasks, transformer models from the field of natural language processing have been shown to be very effective. Therefore, here this type of model was trained to translate invalid Simplified Molecular-Input Line-Entry System (SMILES) into valid representations. The performance of this SMILES corrector was evaluated on four representative methods of de novo generation: a recurrent neural network (RNN), a target-directed RNN, a generative adversarial network (GAN), and a variational autoencoder (VAE). This study has found that the percentage of invalid outputs from these specific generative models ranges between 4 and 89%, with different models having different error-type distributions. Post hoc correction of SMILES was shown to increase model validity. The SMILES corrector trained with one error per input alters 60-90% of invalid generator outputs and fixes 35-80% of them. However, a higher error detection and performance was obtained for transformer models trained with multiple errors per input. In this case, the best model was able to correct 60-95% of invalid generator outputs. Further analysis showed that these fixed molecules are comparable to the correct molecules from the de novo generators based on novelty and similarity. Additionally, the SMILES corrector can be used to expand the amount of interesting new molecules within the targeted chemical space. Introducing different errors into existing molecules yields novel analogs with a uniqueness of 39% and a novelty of approximately 20%. The results of this research demonstrate that SMILES correction is a viable post hoc extension and can enhance the search for better drug candidates.
Collapse
Affiliation(s)
- Linde Schoenmaker
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Olivier J. M. Béquignon
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Willem Jespers
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Gerard J. P. van Westen
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| |
Collapse
|
23
|
Noguchi S, Inoue J. Exploration of Chemical Space Guided by PixelCNN for Fragment-Based De Novo Drug Discovery. J Chem Inf Model 2022; 62:5988-6001. [PMID: 36454646 DOI: 10.1021/acs.jcim.2c01345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
We report a novel framework for achieving fragment-based molecular design using pixel convolutional neural network (PixelCNN) combined with the simplified molecular input line entry system (SMILES) as molecular representation. While a widely used recurrent neural network (RNN) assumes monotonically decaying correlations in strings, PixelCNN captures a periodicity among characters of SMILES. Thus, PixelCNN provides us with a novel solution for the analysis of chemical space by extracting the periodicity of molecular structures that will be buried in SMILES. Moreover, this characteristic enables us to generate molecules by combining several simple building blocks, such as a benzene ring and side-chain structures, which contributes to the effective exploration of chemical space by step-by-step searching for molecules from a target fragment. In conclusion, PixelCNN could be a powerful approach focusing on the periodicity of molecules to explore chemical space for the fragment-based molecular design.
Collapse
Affiliation(s)
- Satoshi Noguchi
- Department of Advanced Interdisciplinary Studies, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo153-8904, Japan
| | - Junya Inoue
- Institute for Industrial Science, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba277-0082, Japan.,Department of Materials Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo113-8656, Japan.,Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo153-8904, Japan
| |
Collapse
|
24
|
Lin Y, Zhang Y, Wang D, Yang B, Shen YQ. Computer especially AI-assisted drug virtual screening and design in traditional Chinese medicine. PHYTOMEDICINE : INTERNATIONAL JOURNAL OF PHYTOTHERAPY AND PHYTOPHARMACOLOGY 2022; 107:154481. [PMID: 36215788 DOI: 10.1016/j.phymed.2022.154481] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 09/14/2022] [Accepted: 09/27/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Traditional Chinese medicine (TCM), as a significant part of the global pharmaceutical science, the abundant molecular compounds it contains is a valuable potential source of designing and screening new drugs. However, due to the un-estimated quantity of the natural molecular compounds and diversity of the related problems drug discovery such as precise screening of molecular compounds or the evaluation of efficacy, physicochemical properties and pharmacokinetics, it is arduous for researchers to design or screen applicable compounds through old methods. With the rapid development of computer technology recently, especially artificial intelligence (AI), its innovation in the field of virtual screening contributes to an increasing efficiency and accuracy in the process of discovering new drugs. PURPOSE This study systematically reviewed the application of computational approaches and artificial intelligence in drug virtual filtering and devising of TCM and presented the potential perspective of computer-aided TCM development. STUDY DESIGN We made a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Then screening the most typical articles for our research. METHODS The systematic review was performed by following the PRISMA guidelines. The databases PubMed, EMBASE, Web of Science, CNKI were used to search for publications that focused on computer-aided drug virtual screening and design in TCM. RESULT Totally, 42 corresponding articles were included in literature reviewing. Aforementioned studies were of great significance to the treatment and cost control of many challenging diseases such as COVID-19, diabetes, Alzheimer's Disease (AD), etc. Computational approaches and AI were widely used in virtual screening in the process of TCM advancing, which include structure-based virtual screening (SBVS) and ligand-based virtual screening (LBVS). Besides, computational technologies were also extensively applied in absorption, distribution, metabolism, excretion and toxicity (ADMET) prediction of candidate drugs and new drug design in crucial course of drug discovery. CONCLUSIONS The applications of computer and AI play an important role in the drug virtual screening and design in the field of TCM, with huge application prospects.
Collapse
Affiliation(s)
- Yumeng Lin
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Chinese Academy of Medical Sciences Research Unit of Oral Carcinogenesis and Management, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - You Zhang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Chinese Academy of Medical Sciences Research Unit of Oral Carcinogenesis and Management, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Dongyang Wang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Chinese Academy of Medical Sciences Research Unit of Oral Carcinogenesis and Management, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Bowen Yang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Chinese Academy of Medical Sciences Research Unit of Oral Carcinogenesis and Management, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Ying-Qiang Shen
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Chinese Academy of Medical Sciences Research Unit of Oral Carcinogenesis and Management, West China Hospital of Stomatology, Sichuan University, Chengdu, China.
| |
Collapse
|