101
|
Patrick Walters W. Comparing classification models-a practical tutorial. J Comput Aided Mol Des 2021; 36:381-389. [PMID: 34549368 DOI: 10.1007/s10822-021-00417-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Accepted: 08/18/2021] [Indexed: 01/17/2023]
Abstract
While machine learning models have become a mainstay in Cheminformatics, the field has yet to agree on standards for model evaluation and comparison. In many cases, authors compare methods by performing multiple folds of cross-validation and reporting the mean value for an evaluation metric such as the area under the receiver operating characteristic. These comparisons of mean values often lack statistical rigor and can lead to inaccurate conclusions. In the interest of encouraging best practices, this tutorial provides an example of how multiple methods can be compared in a statistically rigorous fashion.
Collapse
|
102
|
Fialková V, Zhao J, Papadopoulos K, Engkvist O, Bjerrum EJ, Kogej T, Patronov A. LibINVENT: Reaction-based Generative Scaffold Decoration for in Silico Library Design. J Chem Inf Model 2021; 62:2046-2063. [PMID: 34460269 DOI: 10.1021/acs.jcim.1c00469] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Because of the strong relationship between the desired molecular activity and its structural core, the screening of focused, core-sharing chemical libraries is a key step in lead optimization. Despite the plethora of current research focused on in silico methods for molecule generation, to our knowledge, no tool capable of designing such libraries has been proposed. In this work, we present a novel tool for de novo drug design called LibINVENT. It is capable of rapidly proposing chemical libraries of compounds sharing the same core while maximizing a range of desirable properties. To further help the process of designing focused libraries, the user can list specific chemical reactions that can be used for the library creation. LibINVENT is therefore a flexible tool for generating virtual chemical libraries for lead optimization in a broad range of scenarios. Additionally, the shared core ensures that the compounds in the library are similar, possess desirable properties, and can also be synthesized under the same or similar conditions. The LibINVENT code is freely available in our public repository at https://github.com/MolecularAI/Lib-INVENT. The code necessary for data preprocessing is further available at: https://github.com/MolecularAI/Lib-INVENT-dataset.
Collapse
Affiliation(s)
- Vendy Fialková
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 43183, Sweden
| | - Jiaxi Zhao
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 43183, Sweden.,Department of Pharmaceutical Biosciences, Uppsala University, Uppsala 75237, Sweden
| | - Kostas Papadopoulos
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 43183, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 43183, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 41756, Sweden
| | | | - Thierry Kogej
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 43183, Sweden
| | - Atanas Patronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 43183, Sweden
| |
Collapse
|
103
|
Gallego V, Naveiro R, Roca C, Ríos Insua D, Campillo NE. AI in drug development: a multidisciplinary perspective. Mol Divers 2021; 25:1461-1479. [PMID: 34251580 PMCID: PMC8342381 DOI: 10.1007/s11030-021-10266-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/29/2021] [Indexed: 01/09/2023]
Abstract
The introduction of a new drug to the commercial market follows a complex and long process that typically spans over several years and entails large monetary costs due to a high attrition rate. Because of this, there is an urgent need to improve this process using innovative technologies such as artificial intelligence (AI). Different AI tools are being applied to support all four steps of the drug development process (basic research for drug discovery; pre-clinical phase; clinical phase; and postmarketing). Some of the main tasks where AI has proven useful include identifying molecular targets, searching for hit and lead compounds, synthesising drug-like compounds and predicting ADME-Tox. This review, on the one hand, brings in a mathematical vision of some of the key AI methods used in drug development closer to medicinal chemists and, on the other hand, brings the drug development process and the use of different models closer to mathematicians. Emphasis is placed on two aspects not mentioned in similar surveys, namely, Bayesian approaches and their applications to molecular modelling and the eventual final use of the methods to actually support decisions. Promoting a perfect synergy.
Collapse
Affiliation(s)
- Víctor Gallego
- Institute of Mathematical Sciences (ICMAT-CSIC), Nicolás Cabrera 13-15, 28049, Madrid, Spain
| | - Roi Naveiro
- Institute of Mathematical Sciences (ICMAT-CSIC), Nicolás Cabrera 13-15, 28049, Madrid, Spain
| | - Carlos Roca
- AItenea Biotech S.L. Parque Científico de Madrid, Faraday, 7, 28049, Madrid, Spain
| | - David Ríos Insua
- ICMAT-CSIC and Dept. of Statistics and OR, U. Compl. Madrid, Madrid, Spain
| | - Nuria E Campillo
- CIB-Margarita Salas (CSIC), Ramiro de Maeztu, 9, 28040, Madrid, Spain.
| |
Collapse
|
104
|
Baum ZJ, Yu X, Ayala PY, Zhao Y, Watkins SP, Zhou Q. Artificial Intelligence in Chemistry: Current Trends and Future Directions. J Chem Inf Model 2021; 61:3197-3212. [PMID: 34264069 DOI: 10.1021/acs.jcim.1c00619] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The application of artificial intelligence (AI) to chemistry has grown tremendously in recent years. In this Review, we studied the growth and distribution of AI-related chemistry publications in the last two decades using the CAS Content Collection. The volume of both journal and patent publications have increased dramatically, especially since 2015. Study of the distribution of publications over various chemistry research areas revealed that analytical chemistry and biochemistry are integrating AI to the greatest extent and with the highest growth rates. We also investigated trends in interdisciplinary research and identified frequently occurring combinations of research areas in publications. Furthermore, topic analyses were conducted for journal and patent publications to illustrate emerging associations of AI with certain chemistry research topics. Notable publications in various chemistry disciplines were then evaluated and presented to highlight emerging use cases. Finally, the occurrence of different classes of substances and their roles in AI-related chemistry research were quantified, further detailing the popularity of AI adoption in the life sciences and analytical chemistry. In summary, this Review offers a broad overview of how AI has progressed in various fields of chemistry and aims to provide an understanding of its future directions.
Collapse
Affiliation(s)
- Zachary J Baum
- Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, Ohio 43210, United States
| | - Xiang Yu
- Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, Ohio 43210, United States
| | - Philippe Y Ayala
- Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, Ohio 43210, United States
| | - Yanan Zhao
- Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, Ohio 43210, United States
| | - Steven P Watkins
- Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, Ohio 43210, United States
| | - Qiongqiong Zhou
- Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, Ohio 43210, United States
| |
Collapse
|
105
|
Systematic risk identification and assessment using a new risk map in pharmaceutical R&D. Drug Discov Today 2021; 26:2786-2793. [PMID: 34229082 DOI: 10.1016/j.drudis.2021.06.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 05/21/2021] [Accepted: 06/29/2021] [Indexed: 11/20/2022]
Abstract
Delivering transformative therapies to patients while maintaining growth in the pharmaceutical industry requires an efficient use of research and development (R&D) resources and technologies to develop high-impact new molecular entities (NMEs). However, increasing global R&D competition in the pharmaceutical industry, growing impact of generics and biosimilars, more stringent regulatory requirements, as well as cost-constrained reimbursement frameworks challenge current business models of leading pharmaceutical companies. Big data-based analytics and artificial intelligence (AI) approaches have disrupted various industries and are having an increasing impact in the biopharmaceutical industry, with the promise to improve and accelerate biopharmaceutical R&D processes. Here, we systematically analyze, identify, assess, and categorize key risks across the drug discovery and development value chain using a new risk map approach, providing a comprehensive risk-reward analysis for pharmaceutical R&D.
Collapse
|
106
|
Liu Z, Roberts RA, Lal-Nag M, Chen X, Huang R, Tong W. AI-based language models powering drug discovery and development. Drug Discov Today 2021; 26:2593-2607. [PMID: 34216835 PMCID: PMC8604259 DOI: 10.1016/j.drudis.2021.06.009] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 04/28/2021] [Accepted: 06/25/2021] [Indexed: 02/08/2023]
Abstract
The discovery and development of new medicines is expensive, time-consuming, and often inefficient, with many failures along the way. Powered by artificial intelligence (AI), language models (LMs) have changed the landscape of natural language processing (NLP), offering possibilities to transform treatment development more effectively. Here, we summarize advances in AI-powered LMs and their potential to aid drug discovery and development. We highlight opportunities for AI-powered LMs in target identification, clinical design, regulatory decision-making, and pharmacovigilance. We specifically emphasize the potential role of AI-powered LMs for developing new treatments for Coronavirus 2019 (COVID-19) strategies, including drug repurposing, which can be extrapolated to other infectious diseases that have the potential to cause pandemics. Finally, we set out the remaining challenges and propose possible solutions for improvement.
Collapse
Affiliation(s)
- Zhichao Liu
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA.
| | - Ruth A Roberts
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA; ApconiX, BioHub at Alderley Park, Alderley Edge SK10 4TG, UK; University of Birmingham, Edgbaston, Birmingham B15 2TT, UK
| | - Madhu Lal-Nag
- Office of Translational Sciences, Center for Drug Evaluation and Research, US FDA, Silver Spring, MD 20993, USA
| | - Xi Chen
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA.
| |
Collapse
|
107
|
Kingdon ADH, Alderwick LJ. Structure-based in silico approaches for drug discovery against Mycobacterium tuberculosis. Comput Struct Biotechnol J 2021; 19:3708-3719. [PMID: 34285773 PMCID: PMC8258792 DOI: 10.1016/j.csbj.2021.06.034] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 06/22/2021] [Accepted: 06/22/2021] [Indexed: 12/12/2022] Open
Abstract
Mycobacterium tuberculosis is the causative agent of TB and was estimated to cause 1.4 million death in 2019, alongside 10 million new infections. Drug resistance is a growing issue, with multi-drug resistant infections representing 3.3% of all new infections, hence novel antimycobacterial drugs are urgently required to combat this growing health emergency. Alongside this, increased knowledge of gene essentiality in the pathogenic organism and larger compound databases can aid in the discovery of new drug compounds. The number of protein structures, X-ray based and modelled, is increasing and now accounts for greater than > 80% of all predicted M. tuberculosis proteins; allowing novel targets to be investigated. This review will focus on structure-based in silico approaches for drug discovery, covering a range of complexities and computational demands, with associated antimycobacterial examples. This includes molecular docking, molecular dynamic simulations, ensemble docking and free energy calculations. Applications of machine learning onto each of these approaches will be discussed. The need for experimental validation of computational hits is an essential component, which is unfortunately missing from many current studies. The future outlooks of these approaches will also be discussed.
Collapse
Key Words
- CV, collective variable
- Docking
- Drug discovery
- In silico
- LIE, Linear Interaction Energy
- MD, Molecular Dynamic
- MDR, multi-drug resistant
- MMPB(GB)SA, Molecular Mechanics with Poisson Boltzmann (or generalised Born) and Surface Area solvation
- Machine learning
- Mt, Mycobacterium tuberculosis
- Mycobacterium tuberculosis
- PTC, peptidyl transferase centre
- RMSD, root-mean square-deviation
- Tuberculosis, TB
- cMD, Classical Molecular Dynamic
- cryo-EM, cryogenic electron microscopy
- ns, nanosecond
Collapse
Affiliation(s)
- Alexander D H Kingdon
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Luke J Alderwick
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| |
Collapse
|
108
|
Singh N, Villoutreix BO. Resources and computational strategies to advance small molecule SARS-CoV-2 discovery: Lessons from the pandemic and preparing for future health crises. Comput Struct Biotechnol J 2021; 19:2537-2548. [PMID: 33936562 PMCID: PMC8074526 DOI: 10.1016/j.csbj.2021.04.059] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 04/22/2021] [Accepted: 04/24/2021] [Indexed: 12/11/2022] Open
Abstract
There is an urgent need to identify new therapies that prevent SARS-CoV-2 infection and improve the outcome of COVID-19 patients. This pandemic has thus spurred intensive research in most scientific areas and in a short period of time, several vaccines have been developed. But, while the race to find vaccines for COVID-19 has dominated the headlines, other types of therapeutic agents are being developed. In this mini-review, we report several databases and online tools that could assist the discovery of anti-SARS-CoV-2 small chemical compounds and peptides. We then give examples of studies that combined in silico and in vitro screening, either for drug repositioning purposes or to search for novel bioactive compounds. Finally, we question the overall lack of discussion and plan observed in academic research in many countries during this crisis and suggest that there is room for improvement.
Collapse
Affiliation(s)
- Natesh Singh
- Université de Paris, Inserm UMR 1141 NeuroDiderot, Robert-Debré Hospital, 75019 Paris, France
| | - Bruno O. Villoutreix
- Université de Paris, Inserm UMR 1141 NeuroDiderot, Robert-Debré Hospital, 75019 Paris, France
| |
Collapse
|
109
|
Kimber TB, Chen Y, Volkamer A. Deep Learning in Virtual Screening: Recent Applications and Developments. Int J Mol Sci 2021; 22:4435. [PMID: 33922714 PMCID: PMC8123040 DOI: 10.3390/ijms22094435] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 04/13/2021] [Accepted: 04/14/2021] [Indexed: 01/03/2023] Open
Abstract
Drug discovery is a cost and time-intensive process that is often assisted by computational methods, such as virtual screening, to speed up and guide the design of new compounds. For many years, machine learning methods have been successfully applied in the context of computer-aided drug discovery. Recently, thanks to the rise of novel technologies as well as the increasing amount of available chemical and bioactivity data, deep learning has gained a tremendous impact in rational active compound discovery. Herein, recent applications and developments of machine learning, with a focus on deep learning, in virtual screening for active compound design are reviewed. This includes introducing different compound and protein encodings, deep learning techniques as well as frequently used bioactivity and benchmark data sets for model training and testing. Finally, the present state-of-the-art, including the current challenges and emerging problems, are examined and discussed.
Collapse
Affiliation(s)
| | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany; (T.B.K.); (Y.C.)
| |
Collapse
|
110
|
Abstract
Introduction: Artificial Intelligence (AI) has become a component of our everyday lives, with applications ranging from recommendations on what to buy to the analysis of radiology images. Many of the techniques originally developed for other fields such as language translation and computer vision are now being applied in drug discovery. AI has enabled multiple aspects of drug discovery including the analysis of high content screening data, and the design and synthesis of new molecules.Areas covered: This perspective provides an overview of the application of AI in several areas relevant to drug discovery including property prediction, molecule generation, image analysis, and organic synthesis planning.Expert opinion: While a variety of machine learning methods are now being routinely used to predict biological activity and ADME properties, methods of representing molecules continue to evolve. Molecule generation methods are relatively new and unproven but hold the potential to access new, unexplored areas of chemical space. The application of AI in drug discovery will continue to benefit from dedicated research, as well as AI developments in other fields. With this pairing algorithmic advancements and high-quality data, the impact of AI in drug discovery will continue to grow in the coming years.
Collapse
Affiliation(s)
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| |
Collapse
|
111
|
Jiménez-Luna J, Grisoni F, Weskamp N, Schneider G. Artificial intelligence in drug discovery: recent advances and future perspectives. Expert Opin Drug Discov 2021; 16:949-959. [PMID: 33779453 DOI: 10.1080/17460441.2021.1909567] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Introduction: Artificial intelligence (AI) has inspired computer-aided drug discovery. The widespread adoption of machine learning, in particular deep learning, in multiple scientific disciplines, and the advances in computing hardware and software, among other factors, continue to fuel this development. Much of the initial skepticism regarding applications of AI in pharmaceutical discovery has started to vanish, consequently benefitting medicinal chemistry.Areas covered: The current status of AI in chemoinformatics is reviewed. The topics discussed herein include quantitative structure-activity/property relationship and structure-based modeling, de novo molecular design, and chemical synthesis prediction. Advantages and limitations of current deep learning applications are highlighted, together with a perspective on next-generation AI for drug discovery.Expert opinion: Deep learning-based approaches have only begun to address some fundamental problems in drug discovery. Certain methodological advances, such as message-passing models, spatial-symmetry-preserving networks, hybrid de novo design, and other innovative machine learning paradigms, will likely become commonplace and help address some of the most challenging questions. Open data sharing and model development will play a central role in the advancement of drug discovery with AI.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Francesca Grisoni
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Nils Weskamp
- Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an Der Riss, Germany
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
112
|
Bender A, Cortes-Ciriano I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data. Drug Discov Today 2021; 26:1040-1052. [PMID: 33508423 PMCID: PMC8132984 DOI: 10.1016/j.drudis.2020.11.037] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 11/07/2020] [Accepted: 11/30/2020] [Indexed: 12/11/2022]
Abstract
'Artificial Intelligence' (AI) has recently had a profound impact on areas such as image and speech recognition, and this progress has already translated into practical applications. However, in the drug discovery field, such advances remains scarce, and one of the reasons is intrinsic to the data used. In this review, we discuss aspects of, and differences in, data from different domains, namely the image, speech, chemical, and biological domains, the amounts of data available, and how relevant they are to drug discovery. Improvements in the future are needed with respect to our understanding of biological systems, and the subsequent generation of practically relevant data in sufficient quantities, to truly advance the field of AI in drug discovery, to enable the discovery of novel chemistry, with novel modes of action, which shows desirable efficacy and safety in the clinic.
Collapse
Affiliation(s)
- Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK; Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Isidro Cortes-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|