1
|
Manchev YT, Burn MJ, Popelier PLA. Ichor: A Python library for computational chemistry data management and machine learning force field development. J Comput Chem 2024; 45:2912-2928. [PMID: 39215569 DOI: 10.1002/jcc.27477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 07/09/2024] [Accepted: 07/18/2024] [Indexed: 09/04/2024]
Abstract
We present ichor, an open-source Python library that simplifies data management in computational chemistry and streamlines machine learning force field development. Ichor implements many easily extensible file management tools, in addition to a lazy file reading system, allowing efficient management of hundreds of thousands of computational chemistry files. Data from calculations can be readily stored into databases for easy sharing and post-processing. Raw data can be directly processed by ichor to create machine learning-ready datasets. In addition to powerful data-related capabilities, ichor provides interfaces to popular workload management software employed by High Performance Computing clusters, making for effortless submission of thousands of separate calculations with only a single line of Python code. Furthermore, a simple-to-use command line interface has been implemented through a series of menu systems to further increase accessibility and efficiency of common important ichor tasks. Finally, ichor implements general tools for visualization and analysis of datasets and tools for measuring machine-learning model quality both on test set data and in simulations. With the current functionalities, ichor can serve as an end-to-end data procurement, data management, and analysis solution for machine-learning force-field development.
Collapse
Affiliation(s)
- Yulian T Manchev
- Department of Chemistry, The University of Manchester, Manchester, UK
| | - Matthew J Burn
- Department of Chemistry, The University of Manchester, Manchester, UK
| | - Paul L A Popelier
- Department of Chemistry, The University of Manchester, Manchester, UK
| |
Collapse
|
2
|
Sharma R, Oyagawa CRM, Abbasi H, Dragunow M, Conole D. Phenotypic approaches for CNS drugs. Trends Pharmacol Sci 2024:S0165-6147(24)00188-3. [PMID: 39438155 DOI: 10.1016/j.tips.2024.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 08/09/2024] [Accepted: 09/19/2024] [Indexed: 10/25/2024]
Abstract
Central nervous system (CNS) drug development is plagued by high clinical failure rate. Phenotypic assays promote clinical translation of drugs by reducing complex brain diseases to measurable, clinically valid phenotypes. We critique recent platforms integrating patient-derived brain cells, which most accurately recapitulate CNS disease phenotypes, with higher throughput models, including immortalized cells, to balance validity and scalability. These platforms were screened with conventional commercial chemogenomic compound libraries. We explore emerging library curation strategies to improve hit rate and quality, and screening novel fragment libraries as alternatives, for more tractable drug target deconvolution. The clinically relevant models used in these platforms could harbor important, unidentified drug targets, so we review evolving agnostic target deconvolution approaches, including chemical proteomics and artificial intelligence (AI), which aid in phenotypic screening hit mechanism elucidation, thereby facilitating rational hit-to-drug optimization.
Collapse
Affiliation(s)
- Raahul Sharma
- Centre for Brain Research, Faculty of Medical and Health Sciences, University of Auckland, 85 Park Road, Grafton, Auckland 1023, New Zealand; Auckland Cancer Society Research Centre, Faculty of Medical and Health Sciences, University of Auckland, 85 Park Road, Grafton, Auckland 1023, New Zealand
| | - Caitlin R M Oyagawa
- Centre for Brain Research, Faculty of Medical and Health Sciences, University of Auckland, 85 Park Road, Grafton, Auckland 1023, New Zealand
| | - Hamid Abbasi
- Auckland Bioengineering Institute, The University of Auckland, 70 Symonds Street, Auckland, 1010, New Zealand
| | - Michael Dragunow
- Centre for Brain Research, Faculty of Medical and Health Sciences, University of Auckland, 85 Park Road, Grafton, Auckland 1023, New Zealand.
| | - Daniel Conole
- Auckland Cancer Society Research Centre, Faculty of Medical and Health Sciences, University of Auckland, 85 Park Road, Grafton, Auckland 1023, New Zealand.
| |
Collapse
|
3
|
Nordquist EB, Zhao M, Kumar A, MacKerell AD. Combined Physics- and Machine-Learning-Based Method to Identify Druggable Binding Sites Using SILCS-Hotspots. J Chem Inf Model 2024; 64:7743-7757. [PMID: 39283165 PMCID: PMC11473228 DOI: 10.1021/acs.jcim.4c01189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Identifying druggable binding sites on proteins is an important and challenging problem, particularly for cryptic, allosteric binding sites that may not be obvious from X-ray, cryo-EM, or predicted structures. The Site-Identification by Ligand Competitive Saturation (SILCS) method accounts for the flexibility of the target protein using all-atom molecular simulations that include various small molecule solutes in aqueous solution. During the simulations, the combination of protein flexibility and comprehensive sampling of the water and solute spatial distributions can identify buried binding pockets absent in experimentally determined structures. Previously, we reported a method for leveraging the information in the SILCS sampling to identify binding sites (termed Hotspots) of small mono- or bicyclic compounds, a subset of which coincide with known binding sites of drug-like molecules. Here, we build on that physics-based approach and present a ML model for ranking the Hotspots according to the likelihood they can accommodate drug-like molecules (e.g., molecular weight >200 Da). In the independent validation set, which includes various enzymes and receptors, our model recalls 67% and 89% of experimentally validated ligand binding sites in the top 10 and 20 ranked Hotspots, respectively. Furthermore, we show that the model's output Decision Function is a useful metric to predict binding sites and their potential druggability in new targets. Given the utility the SILCS method for ligand discovery and optimization, the tools presented represent an important advancement in the identification of orthosteric and allosteric binding sites and the discovery of drug-like molecules targeting those sites.
Collapse
Affiliation(s)
- Erik B. Nordquist
- Computer Aided Drug Design Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore, Baltimore, Maryland 21201, United States
| | - Mingtian Zhao
- Computer Aided Drug Design Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore, Baltimore, Maryland 21201, United States
| | - Anmol Kumar
- Computer Aided Drug Design Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore, Baltimore, Maryland 21201, United States
| | - Alexander D. MacKerell
- Computer Aided Drug Design Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore, Baltimore, Maryland 21201, United States
| |
Collapse
|
4
|
Shahbazi F, Esfahani MN, Keshmiri A, Jabbari M. Assessment of machine learning models trained by molecular dynamics simulations results for inferring ethanol adsorption on an aluminium surface. Sci Rep 2024; 14:20437. [PMID: 39227616 PMCID: PMC11372171 DOI: 10.1038/s41598-024-71007-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 08/23/2024] [Indexed: 09/05/2024] Open
Abstract
Molecular dynamics (MD) simulations can reduce our need for experimental tests and provide detailed insight into the chemical reactions and binding kinetics. There are two challenges while dealing with MD simulations: one is the time and length scale limitations, and the latter is efficiently processing the massive amount of data resulting from the MD simulations and generating the proper reaction rates. In this work, we evaluated the use of regression machine learning (ML) methods to solve these two challenges by developing a framework for ethanol adsorption on an Aluminium (Al) slab. This framework comprises three main stages: first, an all-atom molecular dynamics model; second, ML regression models; and third, validation and testing. In stage one, the adsorption of ethanol molecules on the Al surface for various temperatures, velocities and concentrations is simulated using the large-scale atomic/molecular massively parallel simulator (LAMMPS) and ReaxFF. The outcome of stage one is utilised for training, testing, and validating the predictive models in stages two and three. We developed and evaluated 28 different ML models for predicting the number of adsorbed molecules over time, including linear regression, support vector machine (SVM), decision trees, ensemble, Gaussian process regression (GPR), neural network (NN) and Bayesian hyper-parameter optimisation models. Based on the results, the Bayesian-based GPR showed the highest accuracy and the lowest training time. The developed model can predict the number of adsorbed molecules for new cases within seconds, while MD simulations take a few weeks. This adsorption rate can then be used in macroscale simulations to tackle the time and length scale limitations. The proposed numerical framework has the potential to be generalised and, therefore, contribute to future low-cost binding reaction estimations, providing a valuable tool for industry and experimentalists.
Collapse
Affiliation(s)
- Fatemeh Shahbazi
- Warwick Manufacturing Group (WMG), University of Warwick, Coventry, CV4 7AL, UK.
- School of Engineering, University of Manchester, Manchester, M13 9PL, UK.
| | | | - Amir Keshmiri
- School of Engineering, University of Manchester, Manchester, M13 9PL, UK
| | - Masoud Jabbari
- School of Mechanical Engineering, University of Leeds, Leeds, LS2 9JT, UK
| |
Collapse
|
5
|
Singh S, Kaur N, Gehlot A. Application of artificial intelligence in drug design: A review. Comput Biol Med 2024; 179:108810. [PMID: 38991316 DOI: 10.1016/j.compbiomed.2024.108810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/31/2024] [Accepted: 06/24/2024] [Indexed: 07/13/2024]
Abstract
Artificial intelligence (AI) is a field of computer science that involves acquiring information, developing rule bases, and mimicking human behaviour. The fundamental concept behind AI is to create intelligent computer systems that can operate with minimal human intervention or without any intervention at all. These rule-based systems are developed using various machine learning and deep learning models, enabling them to solve complex problems. AI is integrated with these models to learn, understand, and analyse provided data. The rapid advancement of Artificial Intelligence (AI) is reshaping numerous industries, with the pharmaceutical sector experiencing a notable transformation. AI is increasingly being employed to automate, optimize, and personalize various facets of the pharmaceutical industry, particularly in pharmacological research. Traditional drug development methods areknown for being time-consuming, expensive, and less efficient, often taking around a decade and costing billions of dollars. The integration of artificial intelligence (AI) techniques addresses these challenges by enabling the examination of compounds with desired properties from a vast pool of input drugs. Furthermore, it plays a crucial role in drug screening by predicting toxicity, bioactivity, ADME properties (absorption, distribution, metabolism, and excretion), physicochemical properties, and more. AI enhances the drug design process by improving the efficiency and accuracy of predicting drug behaviour, interactions, and properties. These approaches further significantly improve the precision of drug discovery processes and decrease clinical trial costs leading to the development of more effective drugs.
Collapse
Affiliation(s)
- Simrandeep Singh
- Department of Electronics & Communication Engineering, UCRD, Chandigarh University, Gharuan, Punjab, India.
| | - Navjot Kaur
- Department of Pharmacognosy, Amar Shaheed Baba Ajit Singh Jujhar Singh Memorial College of Pharmacy, Bela, Ropar, India
| | - Anita Gehlot
- Uttaranchal Institute of technology, Uttaranchal University, Dehradun, India
| |
Collapse
|
6
|
Ghosh S, Zhao X, Alim M, Brudno M, Bhat M. Artificial intelligence applied to 'omics data in liver disease: towards a personalised approach for diagnosis, prognosis and treatment. Gut 2024:gutjnl-2023-331740. [PMID: 39174307 DOI: 10.1136/gutjnl-2023-331740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 07/24/2024] [Indexed: 08/24/2024]
Abstract
Advancements in omics technologies and artificial intelligence (AI) methodologies are fuelling our progress towards personalised diagnosis, prognosis and treatment strategies in hepatology. This review provides a comprehensive overview of the current landscape of AI methods used for analysis of omics data in liver diseases. We present an overview of the prevalence of different omics levels across various liver diseases, as well as categorise the AI methodology used across the studies. Specifically, we highlight the predominance of transcriptomic and genomic profiling and the relatively sparse exploration of other levels such as the proteome and methylome, which represent untapped potential for novel insights. Publicly available database initiatives such as The Cancer Genome Atlas and The International Cancer Genome Consortium have paved the way for advancements in the diagnosis and treatment of hepatocellular carcinoma. However, the same availability of large omics datasets remains limited for other liver diseases. Furthermore, the application of sophisticated AI methods to handle the complexities of multiomics datasets requires substantial data to train and validate the models and faces challenges in achieving bias-free results with clinical utility. Strategies to address the paucity of data and capitalise on opportunities are discussed. Given the substantial global burden of chronic liver diseases, it is imperative that multicentre collaborations be established to generate large-scale omics data for early disease recognition and intervention. Exploring advanced AI methods is also necessary to maximise the potential of these datasets and improve early detection and personalised treatment strategies.
Collapse
Affiliation(s)
- Soumita Ghosh
- Transplant AI Initiative, Ajmera Transplant Program, University Health Network, Toronto, Ontario, Canada
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Xun Zhao
- Transplant AI Initiative, Ajmera Transplant Program, University Health Network, Toronto, Ontario, Canada
| | - Mouaid Alim
- Transplant AI Initiative, Ajmera Transplant Program, University Health Network, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Michael Brudno
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Vector Institute of Artificial Intelligence, Toronto, Ontario, Canada
| | - Mamatha Bhat
- Transplant AI Initiative, Ajmera Transplant Program, University Health Network, Toronto, Ontario, Canada
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- Division of Gastroenterology, University of Toronto Faculty of Medicine, Toronto, Ontario, Canada
- Toronto General Hospital Research Institute, University Health Network, Toronto, Ontario, Canada
| |
Collapse
|
7
|
Heinzelmann G, Huggins DJ, Gilson MK. BAT2: an Open-Source Tool for Flexible, Automated, and Low Cost Absolute Binding Free Energy Calculations. J Chem Theory Comput 2024; 20:6518-6530. [PMID: 39088306 PMCID: PMC11325538 DOI: 10.1021/acs.jctc.4c00205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 07/19/2024] [Accepted: 07/23/2024] [Indexed: 08/03/2024]
Abstract
Absolute binding free energy (ABFE) calculations with all-atom molecular dynamics (MD) have the potential to greatly reduce costs in the first stages of drug discovery. Here, we introduce BAT2, the new version of the Binding Affinity Tool (BAT.py), designed to combine full automation of ABFE calculations with high-performance MD simulations, making it a potential tool for virtual screening. We describe and test several changes and new features that were incorporated into the code, such as relative restraints between the protein and the ligand instead of using fixed dummy atoms, support for the OpenMM simulation engine, a merged approach to the application/release of restraints, support for cobinders and proteins with multiple chains, and many others. We also reduced the simulation times for each ABFE calculation, assessing the effect on the expected robustness and accuracy of the calculations.
Collapse
Affiliation(s)
- Germano Heinzelmann
- Departamento
de Fisica, Universidade Federal de Santa
Catarina, Florianopolis 88040-970, Brasil
| | - David J. Huggins
- Department
of Physiology and Biophysics, Weill Cornell
Medical College of Cornell University, New York, New York 10065, United States
- Sanders
Tri-Institutional Therapeutics Discovery Institute, 1230 York Avenue, Box 122, New York, New York 10065, United States
| | - Michael K. Gilson
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego 92093, United States
| |
Collapse
|
8
|
Thayer KM, Stetson S, Caballero F, Chiu C, Han ISM. Navigating the complexity of p53-DNA binding: implications for cancer therapy. Biophys Rev 2024; 16:479-496. [PMID: 39309126 PMCID: PMC11415564 DOI: 10.1007/s12551-024-01207-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 06/21/2024] [Indexed: 09/25/2024] Open
Abstract
Abstract The tumor suppressor protein p53, a transcription factor playing a key role in cancer prevention, interacts with DNA as its primary means of determining cell fate in the event of DNA damage. When it becomes mutated, it opens damaged cells to the possibility of reproducing unchecked, which can lead to formation of cancerous tumors. Despite its critical role, therapies at the molecular level to restore p53 native function remain elusive, due to its complex nature. Nevertheless, considerable information has been amassed, and new means of investigating the problem have become available. Objectives We consider structural, biophysical, and bioinformatic insights and their implications for the role of direct and indirect readout and how they contribute to binding site recognition, particularly those of low consensus. We then pivot to consider advances in computational approaches to drug discovery. Materials and methods We have conducted a review of recent literature pertinent to the p53 protein. Results Considerable literature corroborates the idea that p53 is a complex allosteric protein that discriminates its binding sites not only via consensus sequence through direct H-bond contacts, but also a complex combination of factors involving the flexibility of the binding site. New computational methods have emerged capable of capturing such information, which can then be utilized as input to machine learning algorithms towards the goal of more intelligent and efficient de novo allosteric drug design. Conclusions Recent improvements in machine learning coupled with graph theory and sector analysis hold promise for advances to more intelligently design allosteric effectors that may be able to restore native p53-DNA binding activity to mutant proteins. Clinical relevance The ideas brought to light by this review constitute a significant advance that can be applied to ongoing biophysical studies of drugs for p53, paving the way for the continued development of new methodologies for allosteric drugs. Our discoveries hold promise to provide molecular therapeutics which restore p53 native activity, thereby offering new insights for cancer therapies. Graphical Abstract Structural representation of the p53 DBD (PDBID 1TUP). DNA consensus sequence is shown in gray, and the protein is shown in blue. Red beads indicate hotspot residue mutations, green beads represent DNA interacting residues, and yellow beads represent both.
Collapse
Affiliation(s)
- Kelly M. Thayer
- College of Integrative Sciences, Wesleyan University, Middletown, CT 06457 USA
- Department of Chemistry, Wesleyan University, Middletown, CT 06457 USA
- Department of Mathematics and Computer Science, Wesleyan University, Middletown, CT 06457 USA
- Molecular Biophysics Program, Wesleyan University, Middletown, CT 06457 USA
| | - Sean Stetson
- Department of Chemistry, Wesleyan University, Middletown, CT 06457 USA
- Department of Mathematics and Computer Science, Wesleyan University, Middletown, CT 06457 USA
| | - Fernando Caballero
- College of Integrative Sciences, Wesleyan University, Middletown, CT 06457 USA
- Department of Mathematics and Computer Science, Wesleyan University, Middletown, CT 06457 USA
| | - Christopher Chiu
- Department of Mathematics and Computer Science, Wesleyan University, Middletown, CT 06457 USA
| | - In Sub Mark Han
- Molecular Biophysics Program, Wesleyan University, Middletown, CT 06457 USA
| |
Collapse
|
9
|
Iqbal U, Davies T, Perez P. A Review of Recent Hardware and Software Advances in GPU-Accelerated Edge-Computing Single-Board Computers (SBCs) for Computer Vision. SENSORS (BASEL, SWITZERLAND) 2024; 24:4830. [PMID: 39123877 PMCID: PMC11314838 DOI: 10.3390/s24154830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 08/12/2024]
Abstract
Computer Vision (CV) has become increasingly important for Single-Board Computers (SBCs) due to their widespread deployment in addressing real-world problems. Specifically, in the context of smart cities, there is an emerging trend of developing end-to-end video analytics solutions designed to address urban challenges such as traffic management, disaster response, and waste management. However, deploying CV solutions on SBCs presents several pressing challenges (e.g., limited computation power, inefficient energy management, and real-time processing needs) hindering their use at scale. Graphical Processing Units (GPUs) and software-level developments have emerged recently in addressing these challenges to enable the elevated performance of SBCs; however, it is still an active area of research. There is a gap in the literature for a comprehensive review of such recent and rapidly evolving advancements on both software and hardware fronts. The presented review provides a detailed overview of the existing GPU-accelerated edge-computing SBCs and software advancements including algorithm optimization techniques, packages, development frameworks, and hardware deployment specific packages. This review provides a subjective comparative analysis based on critical factors to help applied Artificial Intelligence (AI) researchers in demonstrating the existing state of the art and selecting the best suited combinations for their specific use-case. At the end, the paper also discusses potential limitations of the existing SBCs and highlights the future research directions in this domain.
Collapse
Affiliation(s)
- Umair Iqbal
- SMART Infrastructure Facility, University of Wollongong, Wollongong, NSW 2522, Australia;
| | - Tim Davies
- SMART Infrastructure Facility, University of Wollongong, Wollongong, NSW 2522, Australia;
| | - Pascal Perez
- Australian Urban Research Infrastructure Network (AURIN), University of Melbourne, Melbourne, VIC 3052, Australia;
| |
Collapse
|
10
|
Jørgensen FK, Delcey MG, Hedegård ED. Perspective: multi-configurational methods in bio-inorganic chemistry. Phys Chem Chem Phys 2024; 26:17443-17455. [PMID: 38868993 DOI: 10.1039/d4cp01297f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Transition metal ions play crucial roles in the structure and function of numerous proteins, contributing to essential biological processes such as catalysis, electron transfer, and oxygen binding. However, accurately modeling the electronic structure and properties of metalloproteins poses significant challenges due to the complex nature of their electronic configurations and strong correlation effects. Multiconfigurational quantum chemistry methods are, in principle, the most appropriate tools for addressing these challenges, offering the capability to capture the inherent multi-reference character and strong electron correlation present in bio-inorganic systems. Yet their computational cost has long hindered wider adoption, making methods such as density functional theory (DFT) the method of choice. However, advancements over the past decade have substantially alleviated this limitation, rendering multiconfigurational quantum chemistry methods more accessible and applicable to a wider range of bio-inorganic systems. In this perspective, we discuss some of these developments and how they have already been used to answer some of the most important questions in bio-inorganic chemistry. We also comment on ongoing developments in the field and how the future of the field may evolve.
Collapse
Affiliation(s)
- Frederik K Jørgensen
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark.
| | - Mickaël G Delcey
- Department of Chemistry, Lund University, Naturvetarvägen 14, 221 00 Lund, Sweden
| | - Erik D Hedegård
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark.
- Department of Chemistry, Lund University, Naturvetarvägen 14, 221 00 Lund, Sweden
| |
Collapse
|
11
|
Herre C, Ho A, Eisenbraun B, Vincent J, Nicholson T, Boutsioukis G, Meyer PA, Ottaviano M, Krause KL, Key J, Sliz P. Introduction of the Capsules environment to support further growth of the SBGrid structural biology software collection. Acta Crystallogr D Struct Biol 2024; 80:439-450. [PMID: 38832828 PMCID: PMC11154594 DOI: 10.1107/s2059798324004881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 05/23/2024] [Indexed: 06/06/2024] Open
Abstract
The expansive scientific software ecosystem, characterized by millions of titles across various platforms and formats, poses significant challenges in maintaining reproducibility and provenance in scientific research. The diversity of independently developed applications, evolving versions and heterogeneous components highlights the need for rigorous methodologies to navigate these complexities. In response to these challenges, the SBGrid team builds, installs and configures over 530 specialized software applications for use in the on-premises and cloud-based computing environments of SBGrid Consortium members. To address the intricacies of supporting this diverse application collection, the team has developed the Capsule Software Execution Environment, generally referred to as Capsules. Capsules rely on a collection of programmatically generated bash scripts that work together to isolate the runtime environment of one application from all other applications, thereby providing a transparent cross-platform solution without requiring specialized tools or elevated account privileges for researchers. Capsules facilitate modular, secure software distribution while maintaining a centralized, conflict-free environment. The SBGrid platform, which combines Capsules with the SBGrid collection of structural biology applications, aligns with FAIR goals by enhancing the findability, accessibility, interoperability and reusability of scientific software, ensuring seamless functionality across diverse computing environments. Its adaptability enables application beyond structural biology into other scientific fields.
Collapse
Affiliation(s)
- Carol Herre
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Alex Ho
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Ben Eisenbraun
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - James Vincent
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Thomas Nicholson
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | | | - Peter A. Meyer
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Michelle Ottaviano
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Kurt L. Krause
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Jason Key
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Piotr Sliz
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
- Department of Pediatrics, Boston Children’s Hospital, Boston, Massachusetts, USA
| |
Collapse
|
12
|
Schmidt B, Hildebrandt A. From GPUs to AI and quantum: three waves of acceleration in bioinformatics. Drug Discov Today 2024; 29:103990. [PMID: 38663581 DOI: 10.1016/j.drudis.2024.103990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/05/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024]
Abstract
The enormous growth in the amount of data generated by the life sciences is continuously shifting the field from model-driven science towards data-driven science. The need for efficient processing has led to the adoption of massively parallel accelerators such as graphics processing units (GPUs). Consequently, the development of bioinformatics methods nowadays often heavily depends on the effective use of these powerful technologies. Furthermore, progress in computational techniques and architectures continues to be highly dynamic, involving novel deep neural network models and artificial intelligence (AI) accelerators, and potentially quantum processing units in the future. These are expected to be disruptive for the life sciences as a whole and for drug discovery in particular. Here, we identify three waves of acceleration and their applications in a bioinformatics context: (i) GPU computing, (ii) AI and (iii) next-generation quantum computers.
Collapse
Affiliation(s)
- Bertil Schmidt
- Institut für Informatik, Johannes Gutenberg University, Mainz, Germany.
| | | |
Collapse
|
13
|
Korolev V, Mitrofanov A. The carbon footprint of predicting CO 2 storage capacity in metal-organic frameworks within neural networks. iScience 2024; 27:109644. [PMID: 38628964 PMCID: PMC11019266 DOI: 10.1016/j.isci.2024.109644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/28/2024] [Accepted: 03/27/2024] [Indexed: 04/19/2024] Open
Abstract
While artificial intelligence drives remarkable progress in natural sciences, its broader societal implications are mostly disregarded. In this study, we evaluate environmental impacts of deep learning in materials science through extensive benchmarking. In particular, a set of diverse neural networks is trained for a given supervised learning task to assess greenhouse gas (GHG) emissions during training and inference phases. A chronological perspective showed diminishing returns, manifesting themselves as a 28% decrease in mean absolute error and nearly a 15,000% increase in the carbon footprint of model training in 2016-2022. By means of up-to-date graphics processing units, it is possible to partially offset the immense growth of GHG emissions. Nonetheless, the practice of employing energy-efficient hardware is overlooked by the materials informatics community, as follows from a literature analysis in the field. On the basis of our findings, we encourage researchers to report GHG emissions together with standard performance metrics.
Collapse
Affiliation(s)
- Vadim Korolev
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russia
- MSU Institute for Artificial Intelligence, Lomonosov Moscow State University, Moscow 119192, Russia
| | - Artem Mitrofanov
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russia
- MSU Institute for Artificial Intelligence, Lomonosov Moscow State University, Moscow 119192, Russia
| |
Collapse
|
14
|
Suo X, Tang W, Li Z. Motion Capture Technology in Sports Scenarios: A Survey. SENSORS (BASEL, SWITZERLAND) 2024; 24:2947. [PMID: 38733052 PMCID: PMC11086331 DOI: 10.3390/s24092947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/26/2024] [Accepted: 05/04/2024] [Indexed: 05/13/2024]
Abstract
Motion capture technology plays a crucial role in optimizing athletes' skills, techniques, and strategies by providing detailed feedback on motion data. This article presents a comprehensive survey aimed at guiding researchers in selecting the most suitable motion capture technology for sports science investigations. By comparing and analyzing the characters and applications of different motion capture technologies in sports scenarios, it is observed that cinematography motion capture technology remains the gold standard in biomechanical analysis and continues to dominate sports research applications. Wearable sensor-based motion capture technology has gained significant traction in specialized areas such as winter sports, owing to its reliable system performance. Computer vision-based motion capture technology has made significant advancements in recognition accuracy and system reliability, enabling its application in various sports scenarios, from single-person technique analysis to multi-person tactical analysis. Moreover, the emerging field of multimodal motion capture technology, which harmonizes data from various sources with the integration of artificial intelligence, has proven to be a robust research method for complex scenarios. A comprehensive review of the literature from the past 10 years underscores the increasing significance of motion capture technology in sports, with a notable shift from laboratory research to practical training applications on sports fields. Future developments in this field should prioritize research and technological advancements that cater to practical sports scenarios, addressing challenges such as occlusion, outdoor capture, and real-time feedback.
Collapse
Affiliation(s)
- Xiang Suo
- School of Athletic Performance, Shanghai University of Sport, Shanghai 200438, China;
| | - Weidi Tang
- School of Exercise and Health, Shanghai University of Sport, Shanghai 200438, China;
| | - Zhen Li
- School of Athletic Performance, Shanghai University of Sport, Shanghai 200438, China;
| |
Collapse
|
15
|
Chen M, Jiang X, Zhang L, Chen X, Wen Y, Gu Z, Li X, Zheng M. The emergence of machine learning force fields in drug design. Med Res Rev 2024; 44:1147-1182. [PMID: 38173298 DOI: 10.1002/med.22008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/29/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024]
Abstract
In the field of molecular simulation for drug design, traditional molecular mechanic force fields and quantum chemical theories have been instrumental but limited in terms of scalability and computational efficiency. To overcome these limitations, machine learning force fields (MLFFs) have emerged as a powerful tool capable of balancing accuracy with efficiency. MLFFs rely on the relationship between molecular structures and potential energy, bypassing the need for a preconceived notion of interaction representations. Their accuracy depends on the machine learning models used, and the quality and volume of training data sets. With recent advances in equivariant neural networks and high-quality datasets, MLFFs have significantly improved their performance. This review explores MLFFs, emphasizing their potential in drug design. It elucidates MLFF principles, provides development and validation guidelines, and highlights successful MLFF implementations. It also addresses potential challenges in developing and applying MLFFs. The review concludes by illuminating the path ahead for MLFFs, outlining the challenges to be overcome and the opportunities to be harnessed. This inspires researchers to embrace MLFFs in their investigations as a new tool to perform molecular simulations in drug design.
Collapse
Affiliation(s)
- Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxu Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Yiming Wen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Zhiyong Gu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| |
Collapse
|
16
|
Lam JH, Nakano A, Katritch V. Scalable computation of anisotropic vibrations for large macromolecular assemblies. Nat Commun 2024; 15:3479. [PMID: 38658556 PMCID: PMC11043083 DOI: 10.1038/s41467-024-47685-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 04/02/2024] [Indexed: 04/26/2024] Open
Abstract
The Normal Mode Analysis (NMA) is a standard approach to elucidate the anisotropic vibrations of macromolecules at their folded states, where low-frequency collective motions can reveal rearrangements of domains and changes in the exposed surface of macromolecules. Recent advances in structural biology have enabled the resolution of megascale macromolecules with millions of atoms. However, the calculation of their vibrational modes remains elusive due to the prohibitive cost associated with constructing and diagonalizing the underlying eigenproblem and the current approaches to NMA are not readily adaptable for efficient parallel computing on graphic processing unit (GPU). Here, we present eigenproblem construction and diagonalization approach that implements level-structure bandwidth-reducing algorithms to transform the sparse computation in NMA to a globally-sparse-yet-locally-dense computation, allowing batched tensor products to be most efficiently executed on GPU. We map, optimize, and compare several low-complexity Krylov-subspace eigensolvers, supplemented by techniques such as Chebyshev filtering, sum decomposition, external explicit deflation and shift-and-inverse, to allow fast GPU-resident calculations. The method allows accurate calculation of the first 1000 vibrational modes of some largest structures in PDB ( > 2.4 million atoms) at least 250 times faster than existing methods.
Collapse
Affiliation(s)
- Jordy Homing Lam
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Bridge Institute and Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA
- Center for New Technologies in Drug Discovery and Development, University of Southern California, Los Angeles, CA, USA
| | - Aiichiro Nakano
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, USA.
- Department of Computer Science, University of Southern California, Los Angeles, CA, USA.
| | - Vsevolod Katritch
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Bridge Institute and Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA.
- Center for New Technologies in Drug Discovery and Development, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
17
|
Nandi S, Bhaduri S, Das D, Ghosh P, Mandal M, Mitra P. Deciphering the Lexicon of Protein Targets: A Review on Multifaceted Drug Discovery in the Era of Artificial Intelligence. Mol Pharm 2024; 21:1563-1590. [PMID: 38466810 DOI: 10.1021/acs.molpharmaceut.3c01161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Understanding protein sequence and structure is essential for understanding protein-protein interactions (PPIs), which are essential for many biological processes and diseases. Targeting protein binding hot spots, which regulate signaling and growth, with rational drug design is promising. Rational drug design uses structural data and computational tools to study protein binding sites and protein interfaces to design inhibitors that can change these interactions, thereby potentially leading to therapeutic approaches. Artificial intelligence (AI), such as machine learning (ML) and deep learning (DL), has advanced drug discovery and design by providing computational resources and methods. Quantum chemistry is essential for drug reactivity, toxicology, drug screening, and quantitative structure-activity relationship (QSAR) properties. This review discusses the methodologies and challenges of identifying and characterizing hot spots and binding sites. It also explores the strategies and applications of artificial-intelligence-based rational drug design technologies that target proteins and protein-protein interaction (PPI) binding hot spots. It provides valuable insights for drug design with therapeutic implications. We have also demonstrated the pathological conditions of heat shock protein 27 (HSP27) and matrix metallopoproteinases (MMP2 and MMP9) and designed inhibitors of these proteins using the drug discovery paradigm in a case study on the discovery of drug molecules for cancer treatment. Additionally, the implications of benzothiazole derivatives for anticancer drug design and discovery are deliberated.
Collapse
Affiliation(s)
- Suvendu Nandi
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Soumyadeep Bhaduri
- Centre for Computational and Data Sciences, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Debraj Das
- Centre for Computational and Data Sciences, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Priya Ghosh
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Mahitosh Mandal
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| |
Collapse
|
18
|
Chen X, Huang L. Computational model for drug research. Brief Bioinform 2024; 25:bbae158. [PMID: 38581423 PMCID: PMC10998638 DOI: 10.1093/bib/bbae158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/08/2024] Open
Abstract
This special issue focuses on computational model for drug research regarding drug bioactivity prediction, drug-related interaction prediction, modelling for immunotherapy and modelling for treatment of a specific disease, as conveyed by the following six research and four review articles. Notably, these 10 papers described a wide variety of in-depth drug research from the computational perspective and may represent a snapshot of the wide research landscape.
Collapse
Affiliation(s)
- Xing Chen
- School of Science, Jiangnan University, Wuxi, 214122, China
| | - Li Huang
- The Future Laboratory, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
19
|
Lee XL, Chang JC, Ye XY, Chang CY. Field-programmable gate array and deep neural network-accelerated spatial-spectral interferometry for rapid optical dispersion analysis. OPTICS LETTERS 2024; 49:1289-1292. [PMID: 38426995 DOI: 10.1364/ol.510618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 01/24/2024] [Indexed: 03/02/2024]
Abstract
Spatial-spectral interferometry (SSI) is a technique used to reconstruct the electrical field of an ultrafast laser. By analyzing the spectral phase distribution, SSI provides valuable information about the optical dispersion affecting the spectral phase, which is related to the energy distribution of the laser pulses. SSI is a single-shot measurement process and has a low laser power requirement. However, the reconstruction algorithm involves numerous Fourier transform and filtering operations, which limits the applicability of SSI for real-time dispersion analysis. To address this issue, this Letter proposes a field-programmable gate array (FPGA)-based deep neural network to accelerate the spectral phase reconstruction and dispersion estimation process. The results show that the analysis time is improved from 124 to 9.27 ms, which represents a 13.4-fold improvement on the standard Fourier transform-based reconstruction algorithm.
Collapse
|
20
|
Petracchi B, Torti E, Marenzi E, Leporati F. Acceleration of Hyperspectral Skin Cancer Image Classification through Parallel Machine-Learning Methods. SENSORS (BASEL, SWITZERLAND) 2024; 24:1399. [PMID: 38474935 DOI: 10.3390/s24051399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 01/29/2024] [Accepted: 02/16/2024] [Indexed: 03/14/2024]
Abstract
Hyperspectral imaging (HSI) has become a very compelling technique in different scientific areas; indeed, many researchers use it in the fields of remote sensing, agriculture, forensics, and medicine. In the latter, HSI plays a crucial role as a diagnostic support and for surgery guidance. However, the computational effort in elaborating hyperspectral data is not trivial. Furthermore, the demand for detecting diseases in a short time is undeniable. In this paper, we take up this challenge by parallelizing three machine-learning methods among those that are the most intensively used: Support Vector Machine (SVM), Random Forest (RF), and eXtreme Gradient Boosting (XGB) algorithms using the Compute Unified Device Architecture (CUDA) to accelerate the classification of hyperspectral skin cancer images. They all showed a good performance in HS image classification, in particular when the size of the dataset is limited, as demonstrated in the literature. We illustrate the parallelization techniques adopted for each approach, highlighting the suitability of Graphical Processing Units (GPUs) to this aim. Experimental results show that parallel SVM and XGB algorithms significantly improve the classification times in comparison with their serial counterparts.
Collapse
Affiliation(s)
- Bernardo Petracchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, I-27100 Pavia, Italy
| | - Emanuele Torti
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, I-27100 Pavia, Italy
| | - Elisa Marenzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, I-27100 Pavia, Italy
| | - Francesco Leporati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, I-27100 Pavia, Italy
| |
Collapse
|
21
|
Langs G. Artificial intelligence in medical imaging is a tool for clinical routine and scientific discovery. Semin Arthritis Rheum 2024; 64S:152321. [PMID: 38007360 DOI: 10.1016/j.semarthrit.2023.152321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 11/09/2023] [Indexed: 11/27/2023]
Abstract
The emergence of powerful machine learning methodology together with an increasing amount of data collected during clinical routine have fostered a growing role of artificial intelligence (AI) in medicine. Algorithms have become part of clinical care enhancing image reconstruction, detecting cancer or predicting individual risk to support treatment decisions and patient management. The entry into clinical care is determined by technological feasibility, integration into effective workflows, and immediacy of benefits. At the same time, research is advancing the integration of imaging data and other modalities such as genomics, and the linking of observations made at large scale with the understanding of underlying biological processes. AI will have impact in imaging and precision medicine not only because of the successful application of techniques established in other domains, but primarily because of the effective joint development of new technology and corresponding advance of diagnosis and care.
Collapse
Affiliation(s)
- Georg Langs
- Computational Imaging Research Lab, Christian Doppler Laboratory for Machine Learning Driven Precision Imaging, Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Spitalgasse 23, Vienna 1090, Austria.
| |
Collapse
|
22
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
23
|
Quinn TP, Hess JL, Marshe VS, Barnett MM, Hauschild AC, Maciukiewicz M, Elsheikh SSM, Men X, Schwarz E, Trakadis YJ, Breen MS, Barnett EJ, Zhang-James Y, Ahsen ME, Cao H, Chen J, Hou J, Salekin A, Lin PI, Nicodemus KK, Meyer-Lindenberg A, Bichindaritz I, Faraone SV, Cairns MJ, Pandey G, Müller DJ, Glatt SJ. A primer on the use of machine learning to distil knowledge from data in biological psychiatry. Mol Psychiatry 2024; 29:387-401. [PMID: 38177352 PMCID: PMC11228968 DOI: 10.1038/s41380-023-02334-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/21/2023] [Accepted: 11/17/2023] [Indexed: 01/06/2024]
Abstract
Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.
Collapse
Affiliation(s)
- Thomas P Quinn
- Applied Artificial Intelligence Institute (A2I2), Burwood, VIC, 3125, Australia
| | - Jonathan L Hess
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Victoria S Marshe
- Institute of Medical Science, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Michelle M Barnett
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Anne-Christin Hauschild
- Department of Medical Informatics, Medical University Center Göttingen, Göttingen, Lower Saxony, 37075, Germany
| | - Malgorzata Maciukiewicz
- Hospital Zurich, University of Zurich, Zurich, 8091, Switzerland
- Department of Rheumatology and Immunology, University Hospital Bern, Bern, 3010, Switzerland
- Department for Biomedical Research (DBMR), University of Bern, Bern, 3010, Switzerland
| | - Samar S M Elsheikh
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Xiaoyu Men
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, M5S 1A1, Canada
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Yannis J Trakadis
- Department Human Genetics, McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
| | - Michael S Breen
- Psychiatry, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Eric J Barnett
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Yanli Zhang-James
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Mehmet Eren Ahsen
- Department of Business Administration, Gies College of Business, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
- Department of Biomedical and Translational Sciences, Carle-Illinois School of Medicine, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
| | - Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Junfang Chen
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Jiahui Hou
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Asif Salekin
- Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, 13244, USA
| | - Ping-I Lin
- Discipline of Psychiatry and Mental Health, University of New South Wales, Sydney, NSW, 2052, Australia
- Mental Health Research Unit, South Western Sydney Local Health District, Liverpool, NSW, 2170, Australia
| | | | - Andreas Meyer-Lindenberg
- Clinical Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Isabelle Bichindaritz
- Biomedical and Health Informatics/Computer Science Department, State University of New York at Oswego, Oswego, NY, 13126, USA
- Intelligent Bio Systems Lab, State University of New York at Oswego, Oswego, NY, 13126, USA
| | - Stephen V Faraone
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Daniel J Müller
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, Psychosomatics and Psychotherapy, Center of Mental Health, University Hospital of Würzburg, Würzburg, 97080, Germany
| | - Stephen J Glatt
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Public Health and Preventive Medicine, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
| |
Collapse
|
24
|
Cha Y, Kagalwala MN, Ross J. Navigating the Frontiers of Machine Learning in Neurodegenerative Disease Therapeutics. Pharmaceuticals (Basel) 2024; 17:158. [PMID: 38399373 PMCID: PMC10891920 DOI: 10.3390/ph17020158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 01/16/2024] [Accepted: 01/23/2024] [Indexed: 02/25/2024] Open
Abstract
Recent advances in machine learning hold tremendous potential for enhancing the way we develop new medicines. Over the years, machine learning has been adopted in nearly all facets of drug discovery, including patient stratification, lead discovery, biomarker development, and clinical trial design. In this review, we will discuss the latest developments linking machine learning and CNS drug discovery. While machine learning has aided our understanding of chronic diseases like Alzheimer's disease and Parkinson's disease, only modest effective therapies currently exist. We highlight promising new efforts led by academia and emerging biotech companies to leverage machine learning for exploring new therapies. These approaches aim to not only accelerate drug development but to improve the detection and treatment of neurodegenerative diseases.
Collapse
Affiliation(s)
| | | | - Jermaine Ross
- Alleo Labs, San Francisco, CA 94105, USA; (Y.C.); (M.N.K.)
| |
Collapse
|
25
|
Popov KI, Wellnitz J, Maxfield T, Tropsha A. HIt Discovery using docking ENriched by GEnerative Modeling (HIDDEN GEM): A novel computational workflow for accelerated virtual screening of ultra-large chemical libraries. Mol Inform 2024; 43:e202300207. [PMID: 37802967 PMCID: PMC11156482 DOI: 10.1002/minf.202300207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/03/2023] [Accepted: 10/06/2023] [Indexed: 10/08/2023]
Abstract
Recent rapid expansion of make-on-demand, purchasable, chemical libraries comprising dozens of billions or even trillions of molecules has challenged the efficient application of traditional structure-based virtual screening methods that rely on molecular docking. We present a novel computational methodology termed HIDDEN GEM (HIt Discovery using Docking ENriched by GEnerative Modeling) that greatly accelerates virtual screening. This workflow uniquely integrates machine learning, generative chemistry, massive chemical similarity searching and molecular docking of small, selected libraries in the beginning and the end of the workflow. For each target, HIDDEN GEM nominates a small number of top-scoring virtual hits prioritized from ultra-large chemical libraries. We have benchmarked HIDDEN GEM by conducting virtual screening campaigns for 16 diverse protein targets using Enamine REAL Space library comprising 37 billion molecules. We show that HIDDEN GEM yields the highest enrichment factors as compared to state of the art accelerated virtual screening methods, while requiring the least computational resources. HIDDEN GEM can be executed with any docking software and employed by users with limited computational resources.
Collapse
Affiliation(s)
- Konstantin I. Popov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
- These authors contributed equally: Konstantin I. Popov, James Wellnitz, Travis Maxfield
| | - James Wellnitz
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
- These authors contributed equally: Konstantin I. Popov, James Wellnitz, Travis Maxfield
| | - Travis Maxfield
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
- These authors contributed equally: Konstantin I. Popov, James Wellnitz, Travis Maxfield
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
26
|
Vafaei Sadr A, Bülow R, von Stillfried S, Schmitz NEJ, Pilva P, Hölscher DL, Ha PP, Schweiker M, Boor P. Operational greenhouse-gas emissions of deep learning in digital pathology: a modelling study. Lancet Digit Health 2024; 6:e58-e69. [PMID: 37996339 PMCID: PMC10728828 DOI: 10.1016/s2589-7500(23)00219-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 10/04/2023] [Accepted: 10/16/2023] [Indexed: 11/25/2023]
Abstract
BACKGROUND Deep learning is a promising way to improve health care. Image-processing medical disciplines, such as pathology, are expected to be transformed by deep learning. The first clinically applicable deep-learning diagnostic support tools are already available in cancer pathology, and their number is increasing. However, data on the environmental sustainability of these tools are scarce. We aimed to conduct an environmental-sustainability analysis of a theoretical implementation of deep learning in patient-care pathology. METHODS For this modelling study, we first assembled and calculated relevant data and parameters of a digital-pathology workflow. Data were breast and prostate specimens from the university clinic at the Institute of Pathology of the Rheinisch-Westfälische Technische Hochschule Aachen (Aachen, Germany), for which commercially available deep learning was already available. Only specimens collected between Jan 1 and Dec 31, 2019 were used, to omit potential biases due to the COVID-19 pandemic. Our final selection was based on 2 representative weeks outside holidays, covering different types of specimens. To calculate carbon dioxide (CO2) or CO2 equivalent (CO2 eq) emissions of deep learning in pathology, we gathered relevant data for exact numbers and sizes of whole-slide images (WSIs), which were generated by scanning histopathology samples of prostate and breast specimens. We also evaluated different data input scenarios (including all slide tiles, only tiles containing tissue, or only tiles containing regions of interest). To convert estimated energy consumption from kWh to CO2 eq, we used the internet protocol address of the computational server and the Electricity Maps database to obtain information on the sources of the local electricity grid (ie, renewable vs non-renewable), and estimated the number of trees and proportion of the local and world's forests needed to sequester the CO2 eq emissions. We calculated the computational requirements and CO2 eq emissions of 30 deep-learning models that varied in task and size. The first scenario represented the use of one commercially available deep-learning model for one task in one case (1-task), the second scenario considered two deep-learning models for two tasks per case (2-task), the third scenario represented a future, potentially automated workflow that could handle 7 tasks per case (7-task), and the fourth scenario represented the use of a single potential, large, computer-vision model that could conduct multiple tasks (multitask). We also compared the performance (ie, accuracy) and CO2 eq emissions of different deep-learning models for the classification of renal cell carcinoma on WSIs, also from Rheinisch-Westfälische Technische Hochschule Aachen. We also tested other approaches to reducing CO2 eq emissions, including model pruning and an alternative method for histopathology analysis (pathomics). FINDINGS The pathology database contained 35 552 specimens (237 179 slides), 6420 of which were prostate specimens (10 115 slides) and 11 801 of which were breast specimens (19 763 slides). We selected and subsequently digitised 140 slides from eight breast-cancer cases and 223 slides from five prostate-cancer cases. Applying large deep-learning models on all WSI tiles of prostate and breast pathology cases would result in yearly CO2 eq emissions of 7·65 metric tons (t; 95% CI 7·62-7·68) with the use of a single deep-learning model per case; yearly CO2 eq emissions were up to 100·56 t (100·21-100·99) with the use of seven deep-learning models per case. CO2 eq emissions for different deep-learning model scenarios, data inputs, and deep-learning model sizes for all slides varied from 3·61 t (3·59-3·63) to 2795·30 t (1177·51-6482·13. For the estimated number of overall pathology cases worldwide, the yearly CO2 eq emissions varied, reaching up to 16 megatons (Mt) of CO2 eq, requiring up to 86 590 km2 (0·22%) of world forest to sequester the CO2 eq emissions. Use of the 7-task scenario and small deep-learning models on slides containing tissue only could substantially reduce CO2 eq emissions worldwide by up to 141 times (0·1 Mt, 95% CI 0·1-0·1). Considering the local environment in Aachen, Germany, the maximum CO2 eq emission from the use of deep learning in digital pathology only would require 32·8% (95% CI 13·8-76·6) of the local forest to sequester the CO2 eq emissions. A single pathomics run on a tissue could provide information that was comparable to or even better than the output of multitask deep-learning models, but with 147 times reduced CO2 eq emissions. INTERPRETATION Our findings suggest that widespread use of deep learning in pathology might have considerable global-warming potential. The medical community, policy decision makers, and the public should be aware of this potential and encourage the use of CO2 eq emissions reduction strategies where possible. FUNDING German Research Foundation, European Research Council, German Federal Ministry of Education and Research, Health, Economic Affairs and Climate Action, and the Innovation Fund of the Federal Joint Committee.
Collapse
Affiliation(s)
- Alireza Vafaei Sadr
- Institute of Pathology, University Hospital Aachen, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany; Department of Public Health Sciences, College of Medicine, Pennsylvania State University, Hershey, PA, USA
| | - Roman Bülow
- Institute of Pathology, University Hospital Aachen, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany
| | - Saskia von Stillfried
- Institute of Pathology, University Hospital Aachen, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany
| | - Nikolas E J Schmitz
- Institute of Pathology, University Hospital Aachen, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany
| | - Pourya Pilva
- Institute of Pathology, University Hospital Aachen, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany
| | - David L Hölscher
- Institute of Pathology, University Hospital Aachen, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany
| | - Peiman Pilehchi Ha
- Healthy Living Spaces Lab, Institute for Occupational, Social and Environmental Medicine, Medical Faculty, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany
| | - Marcel Schweiker
- Healthy Living Spaces Lab, Institute for Occupational, Social and Environmental Medicine, Medical Faculty, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany
| | - Peter Boor
- Institute of Pathology, University Hospital Aachen, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany; Department of Nephrology and Immunology, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany.
| |
Collapse
|
27
|
Fu WT, Zhu QK, Li N, Wang YQ, Deng SL, Chen HP, Shen J, Meng LY, Bian Z. Clinically Oriented CBCT Periapical Lesion Evaluation via 3D CNN Algorithm. J Dent Res 2024; 103:5-12. [PMID: 37968798 DOI: 10.1177/00220345231201793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2023] Open
Abstract
Apical periodontitis (AP) is one of the most prevalent disorders in dentistry. However, it can be underdiagnosed in asymptomatic patients. In addition, the perioperative evaluation of 3-dimensional (3D) lesion volume is of great clinical relevance, but the required slice-by-slice manual delineation method is time- and labor-intensive. Here, for quickly and accurately detecting and segmenting periapical lesions (PALs) associated with AP on cone beam computed tomography (CBCT) images, we proposed and geographically validated a novel 3D deep convolutional neural network algorithm, named PAL-Net. On the internal 5-fold cross-validation set, our PAL-Net achieved an area under the receiver operating characteristic curve (AUC) of 0.98. The algorithm also improved the diagnostic performance of dentists with varying levels of experience, as evidenced by their enhanced average AUC values (junior dentists: 0.89-0.94; senior dentists: 0.91-0.93), and significantly reduced the diagnostic time (junior dentists: 69.3 min faster; senior dentists: 32.4 min faster). Moreover, our PAL-Net achieved an average Dice similarity coefficient over 0.87 (0.85-0.88), which is superior or comparable to that of other existing state-of-the-art PAL segmentation algorithms. Furthermore, we validated the generalizability of the PAL-Net system using multiple external data sets from Central, East, and North China, showing that our PAL-Net has strong robustness. Our PAL-Net can help improve the diagnostic performance and speed of dentists working from CBCT images, provide clinically relevant volume information to dentists, and can potentially be applied in dental clinics, especially without expert-level dentists or radiologists.
Collapse
Affiliation(s)
- W T Fu
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, China
- Department of Cariology and Endodontics, School and Hospital of Stomatology, Wuhan University, Wuhan, China
| | - Q K Zhu
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA
| | - N Li
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, China
- Department of Cariology and Endodontics, School and Hospital of Stomatology, Wuhan University, Wuhan, China
| | - Y Q Wang
- Department of Gynecology, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, China
| | - S L Deng
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Hangzhou, China
| | - H P Chen
- Xiangyang Stomatological Hospital; Affiliated Stomatological Hospital of Hubei University of Arts and Science, Xiangyang, China
| | - J Shen
- Department of International VIP Dental Clinic, Tianjin Stomatological Hospital, School of Medicine, Nankai University, Tianjin, China
| | - L Y Meng
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, China
- Department of Cariology and Endodontics, School and Hospital of Stomatology, Wuhan University, Wuhan, China
| | - Z Bian
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, China
- Department of Cariology and Endodontics, School and Hospital of Stomatology, Wuhan University, Wuhan, China
| |
Collapse
|
28
|
Wang YZ, Juroch K, Birch DG. Deep Learning-Assisted Measurements of Photoreceptor Ellipsoid Zone Area and Outer Segment Volume as Biomarkers for Retinitis Pigmentosa. Bioengineering (Basel) 2023; 10:1394. [PMID: 38135984 PMCID: PMC10740805 DOI: 10.3390/bioengineering10121394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/13/2023] [Accepted: 11/29/2023] [Indexed: 12/24/2023] Open
Abstract
The manual segmentation of retinal layers from OCT scan images is time-consuming and costly. The deep learning approach has potential for the automatic delineation of retinal layers to significantly reduce the burden of human graders. In this study, we compared deep learning model (DLM) segmentation with manual correction (DLM-MC) to conventional manual grading (MG) for the measurements of the photoreceptor ellipsoid zone (EZ) area and outer segment (OS) volume in retinitis pigmentosa (RP) to assess whether DLM-MC can be a new gold standard for retinal layer segmentation and for the measurement of retinal layer metrics. Ninety-six high-speed 9 mm 31-line volume scans obtained from 48 patients with RPGR-associated XLRP were selected based on the following criteria: the presence of an EZ band within the scan limit and a detectable EZ in at least three B-scans in a volume scan. All the B-scan images in each volume scan were manually segmented for the EZ and proximal retinal pigment epithelium (pRPE) by two experienced human graders to serve as the ground truth for comparison. The test volume scans were also segmented by a DLM and then manually corrected for EZ and pRPE by the same two graders to obtain DLM-MC segmentation. The EZ area and OS volume were determined by interpolating the discrete two-dimensional B-scan EZ-pRPE layer over the scan area. Dice similarity, Bland-Altman analysis, correlation, and linear regression analyses were conducted to assess the agreement between DLM-MC and MG for the EZ area and OS volume measurements. For the EZ area, the overall mean dice score (SD) between DLM-MC and MG was 0.8524 (0.0821), which was comparable to 0.8417 (0.1111) between two MGs. For the EZ area > 1 mm2, the average dice score increased to 0.8799 (0.0614). When comparing DLM-MC to MG, the Bland-Altman plots revealed a mean difference (SE) of 0.0132 (0.0953) mm2 and a coefficient of repeatability (CoR) of 1.8303 mm2 for the EZ area and a mean difference (SE) of 0.0080 (0.0020) mm3 and a CoR of 0.0381 mm3 for the OS volume. The correlation coefficients (95% CI) were 0.9928 (0.9892-0.9952) and 0.9938 (0.9906-0.9958) for the EZ area and OS volume, respectively. The linear regression slopes (95% CI) were 0.9598 (0.9399-0.9797) and 1.0104 (0.9909-1.0298), respectively. The results from this study suggest that the manual correction of deep learning model segmentation can generate EZ area and OS volume measurements in excellent agreement with those of conventional manual grading in RP. Because DLM-MC is more efficient for retinal layer segmentation from OCT scan images, it has the potential to reduce the burden of human graders in obtaining quantitative measurements of biomarkers for assessing disease progression and treatment outcomes in RP.
Collapse
Affiliation(s)
- Yi-Zhong Wang
- Retina Foundation of the Southwest, 9600 North Central Expressway, Suite 200, Dallas, TX 75231, USA; (K.J.); (D.G.B.)
- Department of Ophthalmology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390, USA
| | - Katherine Juroch
- Retina Foundation of the Southwest, 9600 North Central Expressway, Suite 200, Dallas, TX 75231, USA; (K.J.); (D.G.B.)
| | - David Geoffrey Birch
- Retina Foundation of the Southwest, 9600 North Central Expressway, Suite 200, Dallas, TX 75231, USA; (K.J.); (D.G.B.)
- Department of Ophthalmology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390, USA
| |
Collapse
|
29
|
Handa K, Thomas MC, Kageyama M, Iijima T, Bender A. On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data. J Cheminform 2023; 15:112. [PMID: 37990215 PMCID: PMC10664602 DOI: 10.1186/s13321-023-00781-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 11/10/2023] [Indexed: 11/23/2023] Open
Abstract
While a multitude of deep generative models have recently emerged there exists no best practice for their practically relevant validation. On the one hand, novel de novo-generated molecules cannot be refuted by retrospective validation (so that this type of validation is biased); but on the other hand prospective validation is expensive and then often biased by the human selection process. In this case study, we frame retrospective validation as the ability to mimic human drug design, by answering the following question: Can a generative model trained on early-stage project compounds generate middle/late-stage compounds de novo? To this end, we used experimental data that contains the elapsed time of a synthetic expansion following hit identification from five public (where the time series was pre-processed to better reflect realistic synthetic expansions) and six in-house project datasets, and used REINVENT as a widely adopted RNN-based generative model. After splitting the dataset and training REINVENT on early-stage compounds, we found that rediscovery of middle/late-stage compounds was much higher in public projects (at 1.60%, 0.64%, and 0.21% of the top 100, 500, and 5000 scored generated compounds) than in in-house projects (where the values were 0.00%, 0.03%, and 0.04%, respectively). Similarly, average single nearest neighbour similarity between early- and middle/late-stage compounds in public projects was higher between active compounds than inactive compounds; however, for in-house projects the converse was true, which makes rediscovery (if so desired) more difficult. We hence show that the generative model recovers very few middle/late-stage compounds from real-world drug discovery projects, highlighting the fundamental difference between purely algorithmic design and drug discovery as a real-world process. Evaluating de novo compound design approaches appears, based on the current study, difficult or even impossible to do retrospectively.Scientific Contribution This contribution hence illustrates aspects of evaluating the performance of generative models in a real-world setting which have not been extensively described previously and which hopefully contribute to their further future development.
Collapse
Affiliation(s)
- Koichi Handa
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
- Toxicology & DMPK Research Department, Teijin Institute for Bio-Medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-Shi, Tokyo, 191-8512, Japan.
| | - Morgan C Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Michiharu Kageyama
- Toxicology & DMPK Research Department, Teijin Institute for Bio-Medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-Shi, Tokyo, 191-8512, Japan
| | - Takeshi Iijima
- Toxicology & DMPK Research Department, Teijin Institute for Bio-Medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-Shi, Tokyo, 191-8512, Japan
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
| |
Collapse
|
30
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
31
|
Schrier J, Norquist AJ, Buonassisi T, Brgoch J. In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science. J Am Chem Soc 2023; 145:21699-21716. [PMID: 37754929 DOI: 10.1021/jacs.3c04783] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
Exceptional molecules and materials with one or more extraordinary properties are both technologically valuable and fundamentally interesting, because they often involve new physical phenomena or new compositions that defy expectations. Historically, exceptionality has been achieved through serendipity, but recently, machine learning (ML) and automated experimentation have been widely proposed to accelerate target identification and synthesis planning. In this Perspective, we argue that the data-driven methods commonly used today are well-suited for optimization but not for the realization of new exceptional materials or molecules. Finding such outliers should be possible using ML, but only by shifting away from using traditional ML approaches that tweak the composition, crystal structure, or reaction pathway. We highlight case studies of high-Tc oxide superconductors and superhard materials to demonstrate the challenges of ML-guided discovery and discuss the limitations of automation for this task. We then provide six recommendations for the development of ML methods capable of exceptional materials discovery: (i) Avoid the tyranny of the middle and focus on extrema; (ii) When data are limited, qualitative predictions that provide direction are more valuable than interpolative accuracy; (iii) Sample what can be made and how to make it and defer optimization; (iv) Create room (and look) for the unexpected while pursuing your goal; (v) Try to fill-in-the-blanks of input and output space; (vi) Do not confuse human understanding with model interpretability. We conclude with a description of how these recommendations can be integrated into automated discovery workflows, which should enable the discovery of exceptional molecules and materials.
Collapse
Affiliation(s)
- Joshua Schrier
- Department of Chemistry, Fordham University, The Bronx, New York 10458, United States
| | - Alexander J Norquist
- Department of Chemistry, Haverford College, Haverford, Pennsylvania 19041, United States
| | - Tonio Buonassisi
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Jakoah Brgoch
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204, United States
| |
Collapse
|
32
|
Turon G, Hlozek J, Woodland JG, Kumar A, Chibale K, Duran-Frigola M. First fully-automated AI/ML virtual screening cascade implemented at a drug discovery centre in Africa. Nat Commun 2023; 14:5736. [PMID: 37714843 PMCID: PMC10504240 DOI: 10.1038/s41467-023-41512-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 09/06/2023] [Indexed: 09/17/2023] Open
Abstract
Streamlined data-driven drug discovery remains challenging, especially in resource-limited settings. We present ZairaChem, an artificial intelligence (AI)- and machine learning (ML)-based tool for quantitative structure-activity/property relationship (QSAR/QSPR) modelling. ZairaChem is fully automated, requires low computational resources and works across a broad spectrum of datasets. We describe an end-to-end implementation at the H3D Centre, the leading integrated drug discovery unit in Africa, at which no prior AI/ML capabilities were available. By leveraging in-house data collected over a decade, we have developed a virtual screening cascade for malaria and tuberculosis drug discovery comprising 15 models for key decision-making assays ranging from whole-cell phenotypic screening and cytotoxicity to aqueous solubility, permeability, microsomal metabolic stability, cytochrome inhibition, and cardiotoxicity. We show how computational profiling of compounds, prior to synthesis and testing, can inform progression of frontrunner compounds at H3D. This project is a first-of-its-kind deployment at scale of AI/ML tools in a research centre operating in a low-resource setting.
Collapse
Affiliation(s)
- Gemma Turon
- Ersilia Open Source Initiative, Cambridge, UK
| | - Jason Hlozek
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa
| | - John G Woodland
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council Drug Discovery and Development Research Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Ankur Kumar
- Ersilia Open Source Initiative, Cambridge, UK
| | - Kelly Chibale
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa.
- South African Medical Research Council Drug Discovery and Development Research Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa.
| | | |
Collapse
|
33
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
34
|
Qureshi R, Irfan M, Gondal TM, Khan S, Wu J, Hadi MU, Heymach J, Le X, Yan H, Alam T. AI in drug discovery and its clinical relevance. Heliyon 2023; 9:e17575. [PMID: 37396052 PMCID: PMC10302550 DOI: 10.1016/j.heliyon.2023.e17575] [Citation(s) in RCA: 47] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 06/17/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023] Open
Abstract
The COVID-19 pandemic has emphasized the need for novel drug discovery process. However, the journey from conceptualizing a drug to its eventual implementation in clinical settings is a long, complex, and expensive process, with many potential points of failure. Over the past decade, a vast growth in medical information has coincided with advances in computational hardware (cloud computing, GPUs, and TPUs) and the rise of deep learning. Medical data generated from large molecular screening profiles, personal health or pathology records, and public health organizations could benefit from analysis by Artificial Intelligence (AI) approaches to speed up and prevent failures in the drug discovery pipeline. We present applications of AI at various stages of drug discovery pipelines, including the inherently computational approaches of de novo design and prediction of a drug's likely properties. Open-source databases and AI-based software tools that facilitate drug design are discussed along with their associated problems of molecule representation, data collection, complexity, labeling, and disparities among labels. How contemporary AI methods, such as graph neural networks, reinforcement learning, and generated models, along with structure-based methods, (i.e., molecular dynamics simulations and molecular docking) can contribute to drug discovery applications and analysis of drug responses is also explored. Finally, recent developments and investments in AI-based start-up companies for biotechnology, drug design and their current progress, hopes and promotions are discussed in this article.
Collapse
Affiliation(s)
- Rizwan Qureshi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | - Muhammad Irfan
- Faculty of Electrical Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Swabi, Pakistan
| | | | - Sheheryar Khan
- School of Professional Education & Executive Development, The Hong Kong Polytechnic University, Hong Kong
| | - Jia Wu
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | | | - John Heymach
- Department of Thoracic Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas, MD Anderson Cancer Center, Houston, USA
| | - Xiuning Le
- Department of Thoracic Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas, MD Anderson Cancer Center, Houston, USA
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
35
|
Ahmed F, Samantasinghar A, Manzoor Soomro A, Kim S, Hyun Choi K. A systematic review of computational approaches to understand cancer biology for informed drug repurposing. J Biomed Inform 2023; 142:104373. [PMID: 37120047 DOI: 10.1016/j.jbi.2023.104373] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 03/25/2023] [Accepted: 04/23/2023] [Indexed: 05/01/2023]
Abstract
Cancer is the second leading cause of death globally, trailing only heart disease. In the United States alone, 1.9 million new cancer cases and 609,360 deaths were recorded for 2022. Unfortunately, the success rate for new cancer drug development remains less than 10%, making the disease particularly challenging. This low success rate is largely attributed to the complex and poorly understood nature of cancer etiology. Therefore, it is critical to find alternative approaches to understanding cancer biology and developing effective treatments. One such approach is drug repurposing, which offers a shorter drug development timeline and lower costs while increasing the likelihood of success. In this review, we provide a comprehensive analysis of computational approaches for understanding cancer biology, including systems biology, multi-omics, and pathway analysis. Additionally, we examine the use of these methods for drug repurposing in cancer, including the databases and tools that are used for cancer research. Finally, we present case studies of drug repurposing, discussing their limitations and offering recommendations for future research in this area.
Collapse
Affiliation(s)
- Faheem Ahmed
- Department of Mechatronics Engineering, Jeju National University, Republic of Korea
| | | | | | - Sejong Kim
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Korea; Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea.
| | - Kyung Hyun Choi
- Department of Mechatronics Engineering, Jeju National University, Republic of Korea.
| |
Collapse
|
36
|
Sadybekov AV, Katritch V. Computational approaches streamlining drug discovery. Nature 2023; 616:673-685. [PMID: 37100941 DOI: 10.1038/s41586-023-05905-z] [Citation(s) in RCA: 184] [Impact Index Per Article: 184.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 03/01/2023] [Indexed: 04/28/2023]
Abstract
Computer-aided drug discovery has been around for decades, although the past few years have seen a tectonic shift towards embracing computational technologies in both academia and pharma. This shift is largely defined by the flood of data on ligand properties and binding to therapeutic targets and their 3D structures, abundant computing capacities and the advent of on-demand virtual libraries of drug-like small molecules in their billions. Taking full advantage of these resources requires fast computational methods for effective ligand screening. This includes structure-based virtual screening of gigascale chemical spaces, further facilitated by fast iterative screening approaches. Highly synergistic are developments in deep learning predictions of ligand properties and target activities in lieu of receptor structure. Here we review recent advances in ligand discovery technologies, their potential for reshaping the whole process of drug discovery and development, as well as the challenges they encounter. We also discuss how the rapid identification of highly diverse, potent, target-selective and drug-like ligands to protein targets can democratize the drug discovery process, presenting new opportunities for the cost-effective development of safer and more effective small-molecule treatments.
Collapse
Affiliation(s)
- Anastasiia V Sadybekov
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA
| | - Vsevolod Katritch
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
37
|
Seo S, Lim J, Kim WY. Molecular Generative Model via Retrosynthetically Prepared Chemical Building Block Assembly. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2206674. [PMID: 36596675 PMCID: PMC10015872 DOI: 10.1002/advs.202206674] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Indexed: 06/17/2023]
Abstract
Deep generative models are attracting attention as a smart molecular design strategy. However, previous models often render molecules with low synthesizability, hindering their real-world applications. Here, a novel graph-based conditional generative model which makes molecules by tailoring retrosynthetically prepared chemical building blocks until achieving target properties in an auto-regressive fashion is proposed. This strategy improves the synthesizability and property control of the resulting molecules and also helps learn how to select appropriate building blocks and bind them together to achieve target properties. By applying a negative sampling method to the selection process of building blocks, this model overcame a critical limitation of previous fragment-based models, which can only use molecules from the training set during generation. As a result, the model works equally well with unseen building blocks without sacrificing computational efficiency. It is demonstrated that the model can generate potential inhibitors with high docking scores against the 3CL protease of SARS-COV-2.
Collapse
Affiliation(s)
- Seonghwan Seo
- HITS Incorporation124 Teheran‐ro, Gangnam‐guSeoul06234Republic of Korea
- Department of ChemistryKAIST, 291 Daehak‐ro, Yuseong‐guDaejeon34141Republic of Korea
| | - Jaechang Lim
- HITS Incorporation124 Teheran‐ro, Gangnam‐guSeoul06234Republic of Korea
| | - Woo Youn Kim
- HITS Incorporation124 Teheran‐ro, Gangnam‐guSeoul06234Republic of Korea
- Department of ChemistryKAIST, 291 Daehak‐ro, Yuseong‐guDaejeon34141Republic of Korea
- AI InstituteKAIST, 291 Daehak‐ro, Yuseong‐guDaejeon34141Republic of Korea
| |
Collapse
|
38
|
A Novel Method for Fast Generation of 3D Objects from Multiple Depth Sensors. JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH 2023. [DOI: 10.2478/jaiscr-2023-0009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023] Open
Abstract
Abstract
Scanning real 3D objects face many technical challenges. Stationary solutions allow for accurate scanning. However, they usually require special and expensive equipment. Competitive mobile solutions (handheld scanners, LiDARs on vehicles, etc.) do not allow for an accurate and fast mapping of the surface of the scanned object. The article proposes an end-to-end automated solution that enables the use of widely available mobile and stationary scanners. The related system generates a full 3D model of the object based on multiple depth sensors. For this purpose, the scanned object is marked with markers. Markers type and positions are automatically detected and mapped to a template mesh. The reference template is automatically selected for the scanned object, which is then transformed according to the data from the scanners with non-rigid transformation. The solution allows for the fast scanning of complex and varied size objects, constituting a set of training data for segmentation and classification systems of 3D scenes. The main advantage of the proposed solution is its efficiency, which enables real-time scanning and the ability to generate a mesh with a regular structure. It is critical for training data for machine learning algorithms. The source code is available at https://github.com/SATOffice/improved_scanner3D.
Collapse
|
39
|
Firouzi R, Ashouri M. Identification of Potential Anti‐COVID‐19 Drug Leads from Medicinal Plants through Virtual High‐Throughput Screening. ChemistrySelect 2023. [DOI: 10.1002/slct.202203865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Affiliation(s)
- Rohoullah Firouzi
- Department of Physical Chemistry Chemistry and Chemical Engineering Research Center of Iran Tehran Iran
| | - Mitra Ashouri
- Department of Physical Chemistry School of Chemistry College of Science University of Tehran Tehran Iran
| |
Collapse
|
40
|
Yu Y, Xu S, He R, Liang G. Application of Molecular Simulation Methods in Food Science: Status and Prospects. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:2684-2703. [PMID: 36719790 DOI: 10.1021/acs.jafc.2c06789] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Molecular simulation methods, such as molecular docking, molecular dynamic (MD) simulation, and quantum chemical (QC) calculation, have become popular as characterization and/or virtual screening tools because they can visually display interaction details that in vitro experiments can not capture and quickly screen bioactive compounds from large databases with millions of molecules. Currently, interdisciplinary research has expanded molecular simulation technology from computer aided drug design (CADD) to food science. More food scientists are supporting their hypotheses/results with this technology. To understand better the use of molecular simulation methods, it is necessary to systematically summarize the latest applications and usage trends of molecular simulation methods in the research field of food science. However, this type of review article is rare. To bridge this gap, we have comprehensively summarized the principle, combination usage, and application of molecular simulation methods in food science. We also analyzed the limitations and future trends and offered valuable strategies with the latest technologies to help food scientists use molecular simulation methods.
Collapse
Affiliation(s)
- Yuandong Yu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Shiqi Xu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Ran He
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| |
Collapse
|
41
|
Zhou Y, Zhao W, Feng Y, Niu X, Dong Y, Chen Y. Artificial Intelligence-Assisted Digital Immunoassay Based on a Programmable-Particle-Decoding Technique for Multitarget Ultrasensitive Detection. Anal Chem 2023; 95:1589-1598. [PMID: 36571573 DOI: 10.1021/acs.analchem.2c04703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The development of a multitarget ultrasensitive immunoassay is significant to fields such as medical research, clinical diagnosis, and food safety inspection. In this study, an artificial intelligence (AI)-assisted programmable-particle-decoding technique (APT)-based digital immunoassay system was developed to perform multitarget ultrasensitive detection. Multitarget was encoded by programmable polystyrene (PS) microspheres with different characteristics (particle size and number), and subsequent visible signals were recorded under an optical microscope after the immune reaction. The resultant images were further analyzed using a customized, AI-based computer vision technique to decode the intrinsic properties of polystyrene microspheres and to reveal the types and concentrations of targets. Our strategy has successfully detected multiple inflammatory markers in clinical serum and antibiotics with a broad detection range from pg/mL to μg/mL without extra signal amplification and conversion. An AI-based digital immunoassay system exhibits great potential to be used for the next generation of multitarget detection in disease screening for candidate patients.
Collapse
Affiliation(s)
- Yang Zhou
- College of Food Science and Technology, Huazhong Agricultural University, Wuhan 430070, Hubei, China.,College of Engineering, Huazhong Agricultural University, Wuhan 430070, Hubei, China
| | - Weiqi Zhao
- College of Food Science and Technology, Huazhong Agricultural University, Wuhan 430070, Hubei, China
| | - Yaoze Feng
- College of Engineering, Huazhong Agricultural University, Wuhan 430070, Hubei, China
| | - Xiaohu Niu
- College of Engineering, Huazhong Agricultural University, Wuhan 430070, Hubei, China
| | - Yongzhen Dong
- College of Food Science and Technology, Huazhong Agricultural University, Wuhan 430070, Hubei, China
| | - Yiping Chen
- College of Food Science and Technology, Huazhong Agricultural University, Wuhan 430070, Hubei, China.,Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Shenzhen 518120, Guangdong, China
| |
Collapse
|
42
|
Sahoo SS, Kobow K, Zhang J, Buchhalter J, Dayyani M, Upadhyaya DP, Prantzalos K, Bhattacharjee M, Blumcke I, Wiebe S, Lhatoo SD. Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records. Sci Rep 2022; 12:19430. [PMID: 36371527 PMCID: PMC9653502 DOI: 10.1038/s41598-022-23101-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 10/25/2022] [Indexed: 11/13/2022] Open
Abstract
Biomedical ontologies are widely used to harmonize heterogeneous data and integrate large volumes of clinical data from multiple sources. This study analyzed the utility of ontologies beyond their traditional roles, that is, in addressing a challenging and currently underserved field of feature engineering in machine learning workflows. Machine learning workflows are being increasingly used to analyze medical records with heterogeneous phenotypic, genotypic, and related medical terms to improve patient care. We performed a retrospective study using neuropathology reports from the German Neuropathology Reference Center for Epilepsy Surgery at Erlangen, Germany. This cohort included 312 patients who underwent epilepsy surgery and were labeled with one or more diagnoses, including dual pathology, hippocampal sclerosis, malformation of cortical dysplasia, tumor, encephalitis, and gliosis. We modeled the diagnosis terms together with their microscopy, immunohistochemistry, anatomy, etiologies, and imaging findings using the description logic-based Web Ontology Language (OWL) in the Epilepsy and Seizure Ontology (EpSO). Three tree-based machine learning models were used to classify the neuropathology reports into one or more diagnosis classes with and without ontology-based feature engineering. We used five-fold cross validation to avoid overfitting with a fixed number of repetitions while leaving out one subset of data for testing, and we used recall, balanced accuracy, and hamming loss as performance metrics for the multi-label classification task. The epilepsy ontology-based feature engineering approach improved the performance of all the three learning models with an improvement of 35.7%, 54.5%, and 33.3% in logistics regression, random forest, and gradient tree boosting models respectively. The run time performance of all three models improved significantly with ontology-based feature engineering with gradient tree boosting model showing a 93.8% reduction in the time required for training and testing of the model. Although, all three models showed an overall improved performance across the three-performance metrics using ontology-based feature engineering, the rate of improvement was not consistent across all input features. To analyze this variation in performance, we computed feature importance scores and found that microscopy had the highest importance score across the three models, followed by imaging, immunohistochemistry, and anatomy in a decreasing order of importance scores. This study showed that ontologies have an important role in feature engineering to make heterogeneous clinical data accessible to machine learning models and also improve the performance of machine learning models in multilabel multiclass classification tasks.
Collapse
Affiliation(s)
- Satya S Sahoo
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA.
| | - Katja Kobow
- Institute of Neuropathology, Erlangen, Germany
| | - Jianzhe Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Jeffrey Buchhalter
- Department of Pediatrics, University of Calgary School of Medicine, Calgary, Canada
| | - Mojtaba Dayyani
- Department of Neurology, University of Texas Health Sciences Center, Texas, USA
| | - Dipak P Upadhyaya
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Katrina Prantzalos
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | | | | | - Samuel Wiebe
- Department of Pediatrics, University of Calgary School of Medicine, Calgary, Canada.
| | - Samden D Lhatoo
- Department of Neurology, University of Texas Health Sciences Center, Texas, USA.
| |
Collapse
|
43
|
Zhang Y, Luo M, Wu P, Wu S, Lee TY, Bai C. Application of Computational Biology and Artificial Intelligence in Drug Design. Int J Mol Sci 2022; 23:13568. [PMID: 36362355 PMCID: PMC9658956 DOI: 10.3390/ijms232113568] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 10/29/2022] [Accepted: 11/03/2022] [Indexed: 08/24/2023] Open
Abstract
Traditional drug design requires a great amount of research time and developmental expense. Booming computational approaches, including computational biology, computer-aided drug design, and artificial intelligence, have the potential to expedite the efficiency of drug discovery by minimizing the time and financial cost. In recent years, computational approaches are being widely used to improve the efficacy and effectiveness of drug discovery and pipeline, leading to the approval of plenty of new drugs for marketing. The present review emphasizes on the applications of these indispensable computational approaches in aiding target identification, lead discovery, and lead optimization. Some challenges of using these approaches for drug design are also discussed. Moreover, we propose a methodology for integrating various computational techniques into new drug discovery and design.
Collapse
Affiliation(s)
- Yue Zhang
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| | - Mengqi Luo
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Peng Wu
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518055, China
| | - Song Wu
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Tzong-Yi Lee
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| | - Chen Bai
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| |
Collapse
|
44
|
Fraiwan M, Al-Kofahi N, Ibnian A, Hanatleh O. Detection of developmental dysplasia of the hip in X-ray images using deep transfer learning. BMC Med Inform Decis Mak 2022; 22:216. [PMID: 35964072 PMCID: PMC9375244 DOI: 10.1186/s12911-022-01957-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Accepted: 07/30/2022] [Indexed: 01/14/2023] Open
Abstract
Background Developmental dysplasia of the hip (DDH) is a relatively common disorder in newborns, with a reported prevalence of 1–5 per 1000 births. It can lead to developmental abnormalities in terms of mechanical difficulties and a displacement of the joint (i.e., subluxation or dysplasia). An early diagnosis in the first few months from birth can drastically improve healing, render surgical intervention unnecessary and reduce bracing time. A pelvic X-ray inspection represents the gold standard for DDH diagnosis. Recent advances in deep learning artificial intelligence have enabled the use of many image-based medical decision-making applications. The present study employs deep transfer learning in detecting DDH in pelvic X-ray images without the need for explicit measurements. Methods Pelvic anteroposterior X-ray images from 354 subjects (120 DDH and 234 normal) were collected locally at two hospitals in northern Jordan. A system that accepts these images as input and classifies them as DDH or normal was developed using thirteen deep transfer learning models. Various performance metrics were evaluated in addition to the overfitting/underfitting behavior and the training times. Results The highest mean DDH detection accuracy was 96.3% achieved using the DarkNet53 model, although other models achieved comparable results. A common theme across all the models was the extremely high sensitivity (i.e., recall) value at the expense of specificity. The F1 score, precision, recall and specificity for DarkNet53 were 95%, 90.6%, 100% and 94.3%, respectively. Conclusions Our automated method appears to be a highly accurate DDH screening and diagnosis method. Moreover, the performance evaluation shows that it is possible to further improve the system by expanding the dataset to include more X-ray images.
Collapse
Affiliation(s)
- Mohammad Fraiwan
- Department of Computer Engineering, Jordan University of Science and Technology, Irbid, Jordan.
| | - Noran Al-Kofahi
- Department of Internal Medicine, Jordan University of Science and Technology, Irbid, Jordan
| | - Ali Ibnian
- Department of Internal Medicine, Jordan University of Science and Technology, Irbid, Jordan
| | - Omar Hanatleh
- Department of Internal Medicine, Jordan University of Science and Technology, Irbid, Jordan
| |
Collapse
|
45
|
Pandey M, Radaeva M, Mslati H, Garland O, Fernandez M, Ester M, Cherkasov A. Ligand Binding Prediction Using Protein Structure Graphs and Residual Graph Attention Networks. Molecules 2022; 27:molecules27165114. [PMID: 36014351 PMCID: PMC9416537 DOI: 10.3390/molecules27165114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 08/03/2022] [Accepted: 08/09/2022] [Indexed: 11/25/2022] Open
Abstract
Computational prediction of ligand–target interactions is a crucial part of modern drug discovery as it helps to bypass high costs and labor demands of in vitro and in vivo screening. As the wealth of bioactivity data accumulates, it provides opportunities for the development of deep learning (DL) models with increasing predictive powers. Conventionally, such models were either limited to the use of very simplified representations of proteins or ineffective voxelization of their 3D structures. Herein, we present the development of the PSG-BAR (Protein Structure Graph-Binding Affinity Regression) approach that utilizes 3D structural information of the proteins along with 2D graph representations of ligands. The method also introduces attention scores to selectively weight protein regions that are most important for ligand binding. Results: The developed approach demonstrates the state-of-the-art performance on several binding affinity benchmarking datasets. The attention-based pooling of protein graphs enables identification of surface residues as critical residues for protein–ligand binding. Finally, we validate our model predictions against an experimental assay on a viral main protease (Mpro)—the hallmark target of SARS-CoV-2 coronavirus.
Collapse
Affiliation(s)
- Mohit Pandey
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Mariia Radaeva
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Hazem Mslati
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Olivia Garland
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Michael Fernandez
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Artem Cherkasov
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
- Correspondence:
| |
Collapse
|
46
|
Ton AT, Pandey M, Smith JR, Ban F, Fernandez M, Cherkasov A. Targeting SARS-CoV-2 Papain-Like Protease in the Post-Vaccine Era. Trends Pharmacol Sci 2022; 43:906-919. [PMID: 36114026 PMCID: PMC9399131 DOI: 10.1016/j.tips.2022.08.008] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/10/2022] [Accepted: 08/19/2022] [Indexed: 11/29/2022]
Abstract
While vaccines remain at the forefront of global healthcare responses, pioneering therapeutics against SARS-CoV-2 are expected to fill the gaps for waning immunity. Rapid development and approval of orally available direct-acting antivirals targeting crucial SARS-CoV-2 proteins marked the beginning of the era of small-molecule drugs for COVID-19. In that regard, the papain-like protease (PLpro) can be considered a major SARS-CoV-2 therapeutic target due to its dual biological role in suppressing host innate immune responses and in ensuring viral replication. Here, we summarize the challenges of targeting PLpro and innovative early-stage PLpro-specific small molecules. We propose that state-of-the-art computer-aided drug design (CADD) methodologies will play a critical role in the discovery of PLpro compounds as a novel class of COVID-19 drugs.
Collapse
Affiliation(s)
- Anh-Tien Ton
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada
| | - Mohit Pandey
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada
| | - Jason R Smith
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada; Department of Chemistry, Simon Fraser University, Burnaby, Canada
| | - Fuqiang Ban
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada
| | - Michael Fernandez
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada
| | - Artem Cherkasov
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|