1
|
Snyder SH, Vignaux PA, Ozalp MK, Gerlach J, Puhl AC, Lane TR, Corbett J, Urbina F, Ekins S. The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications. Commun Chem 2024; 7:134. [PMID: 38866916 PMCID: PMC11169557 DOI: 10.1038/s42004-024-01220-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 06/04/2024] [Indexed: 06/14/2024] Open
Abstract
Recent advances in machine learning (ML) have led to newer model architectures including transformers (large language models, LLMs) showing state of the art results in text generation and image analysis as well as few-shot learning (FSLC) models which offer predictive power with extremely small datasets. These new architectures may offer promise, yet the 'no-free lunch' theorem suggests that no single model algorithm can outperform at all possible tasks. Here, we explore the capabilities of classical (SVR), FSLC, and transformer models (MolBART) over a range of dataset tasks and show a 'goldilocks zone' for each model type, in which dataset size and feature distribution (i.e. dataset "diversity") determines the optimal algorithm strategy. When datasets are small ( < 50 molecules), FSLC tend to outperform both classical ML and transformers. When datasets are small-to-medium sized (50-240 molecules) and diverse, transformers outperform both classical models and few-shot learning. Finally, when datasets are of larger and of sufficient size, classical models then perform the best, suggesting that the optimal model to choose likely depends on the dataset available, its size and diversity. These findings may help to answer the perennial question of which ML algorithm is to be used when faced with a new dataset.
Collapse
Affiliation(s)
- Scott H Snyder
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Patricia A Vignaux
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Mustafa Kemal Ozalp
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Jacob Gerlach
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Ana C Puhl
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - John Corbett
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| |
Collapse
|
2
|
Puhl AC, Lewicki SA, Gao ZG, Pramanik A, Makarov V, Ekins S, Jacobson KA. Machine learning-aided search for ligands of P2Y 6 and other P2Y receptors. Purinergic Signal 2024:10.1007/s11302-024-10003-4. [PMID: 38526670 DOI: 10.1007/s11302-024-10003-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 03/12/2024] [Indexed: 03/27/2024] Open
Abstract
The P2Y6 receptor, activated by uridine diphosphate (UDP), is a target for antagonists in inflammatory, neurodegenerative, and metabolic disorders, yet few potent and selective antagonists are known to date. This prompted us to use machine learning as a novel approach to aid ligand discovery, with pharmacological evaluation at three P2YR subtypes: initially P2Y6 and subsequently P2Y1 and P2Y14. Relying on extensive published data for P2Y6R agonists, we generated and validated an array of classification machine learning model using the algorithms deep learning (DL), adaboost classifier (ada), Bernoulli NB (bnb), k-nearest neighbors (kNN) classifier, logistic regression (lreg), random forest classifier (rf), support vector classification (SVC), and XGBoost (XGB) classifier models, and the common consensus was applied to molecular selection of 21 diverse structures. Compounds were screened using human P2Y6R-induced functional calcium transients in transfected 1321N1 astrocytoma cells and fluorescent binding inhibition at closely related hP2Y14R expressed in CHO cells. The hit compound ABBV-744, an experimental anticancer drug with a 6-methyl-7-oxo-6,7-dihydro-1H-pyrrolo[2,3-c]pyridine scaffold, had multifaceted interactions with the P2YR family: hP2Y6R inhibition in a non-surmountable fashion, suggesting that noncompetitive antagonism, and hP2Y1R enhancement, but not hP2Y14R binding inhibition. Other machine learning-selected compounds were either weak (experimental anti-asthmatic drug AZD5423 with a phenyl-1H-indazole scaffold) or inactive in inhibiting the hP2Y6R. Experimental drugs TAK-593 and GSK1070916 (100 µM) inhibited P2Y14R fluorescent binding by 50% and 38%, respectively, and all other compounds by < 20%. Thus, machine learning has led the way toward revealing previously unknown modulators of several P2YR subtypes that have varied effects.
Collapse
Affiliation(s)
- Ana C Puhl
- Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Sarah A Lewicki
- Molecular Recognition Section, Laboratory of Bioorganic Chemistry, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Zhan-Guo Gao
- Molecular Recognition Section, Laboratory of Bioorganic Chemistry, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Asmita Pramanik
- Molecular Recognition Section, Laboratory of Bioorganic Chemistry, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Vadim Makarov
- Research Center of Biotechnology RAS, Leninsky Prospekt 33-2, 119071, Moscow, Russian Federation
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| | - Kenneth A Jacobson
- Molecular Recognition Section, Laboratory of Bioorganic Chemistry, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
3
|
Karampuri A, Perugu S. A breast cancer-specific combinational QSAR model development using machine learning and deep learning approaches. FRONTIERS IN BIOINFORMATICS 2024; 3:1328262. [PMID: 38288043 PMCID: PMC10822965 DOI: 10.3389/fbinf.2023.1328262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 12/21/2023] [Indexed: 01/31/2024] Open
Abstract
Breast cancer is the most prevalent and heterogeneous form of cancer affecting women worldwide. Various therapeutic strategies are in practice based on the extent of disease spread, such as surgery, chemotherapy, radiotherapy, and immunotherapy. Combinational therapy is another strategy that has proven to be effective in controlling cancer progression. Administration of Anchor drug, a well-established primary therapeutic agent with known efficacy for specific targets, with Library drug, a supplementary drug to enhance the efficacy of anchor drugs and broaden the therapeutic approach. Our work focused on harnessing regression-based Machine learning (ML) and deep learning (DL) algorithms to develop a structure-activity relationship between the molecular descriptors of drug pairs and their combined biological activity through a QSAR (Quantitative structure-activity relationship) model. 11 popularly known machine learning and deep learning algorithms were used to develop QSAR models. A total of 52 breast cancer cell lines, 25 anchor drugs, and 51 library drugs were considered in developing the QSAR model. It was observed that Deep Neural Networks (DNNs) achieved an impressive R2 (Coefficient of Determination) of 0.94, with an RMSE (Root Mean Square Error) value of 0.255, making it the most effective algorithm for developing a structure-activity relationship with strong generalization capabilities. In conclusion, applying combinational therapy alongside ML and DL techniques represents a promising approach to combating breast cancer.
Collapse
Affiliation(s)
| | - Shyam Perugu
- Department of Biotechnology, National Institute of Technology, Warangal, India
| |
Collapse
|
4
|
Faris A, Cacciatore I, Ibrahim IM, Al Mughram MH, Hadni H, Tabti K, Elhallaoui M. In silico computational drug discovery: a Monte Carlo approach for developing a novel JAK3 inhibitors. J Biomol Struct Dyn 2023:1-23. [PMID: 37861428 DOI: 10.1080/07391102.2023.2270709] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 10/08/2023] [Indexed: 10/21/2023]
Abstract
Inhibition of Janus kinase 3 (JAK3), a member of the JAK family of tyrosine kinases, remains an essential area of research for developing treatments for autoimmune diseases, particularly cancer and rheumatoid arthritis. The recent discovery of a new JAK3 protein, PDB ID: 4Z16, offers exciting possibilities for developing inhibitors capable of forming a covalent bond with the Cys909 residue, thereby contributing to JAK3 inhibition. A powerful prediction model was constructed and validated using Monte Carlo methods, employing various internal and external techniques. This approach resulted in the prediction of eleven new molecules, which were subsequently filtered to identify six compounds exhibiting potent pIC50 values. These candidates were then subjected to ADMET analysis, molecular docking (including reversible-reversible docking with tofacitinib, an FDA-approved drug, and reversible-irreversible docking for the newly designed compounds), molecular dynamics (MD) analysis for 300 ns, and calculation of free binding energy. The results suggested that these compounds hold promise as JAK3 inhibitors. In summary, the new compounds have exhibited favorable outcomes compared to other compounds across various modeling approaches. The collective findings from these investigations provide valuable insights into the potential therapeutic applications of covalent JAK3 inhibitors, offering a promising direction for the development of novel treatments for autoimmune disorders.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Abdelmoujoud Faris
- LIMAS, Department of Chemical Sciences, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
| | - Ivana Cacciatore
- Department of Pharmacy, University 'G. d'Annunzio' of Chieti-Pescara, Italy
| | - Ibrahim M Ibrahim
- Biophysics Department, Faculty of Science, Cairo University, Giza, Egypt
| | - Mohammed H Al Mughram
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Khalid University, Abha, Saudi Arabia
| | - Hanine Hadni
- LIMAS, Department of Chemical Sciences, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
| | - Kamal Tabti
- Molecular Chemistry and Natural Substances Laboratory, Moulay Ismail University, Faculty of Science, Meknes, Morocco
| | - Menana Elhallaoui
- LIMAS, Department of Chemical Sciences, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
| |
Collapse
|
5
|
Deng J, Yang Z, Wang H, Ojima I, Samaras D, Wang F. A systematic study of key elements underlying molecular property prediction. Nat Commun 2023; 14:6395. [PMID: 37833262 PMCID: PMC10575948 DOI: 10.1038/s41467-023-41948-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 09/18/2023] [Indexed: 10/15/2023] Open
Abstract
Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advancements in this field. Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature. To investigate the predictive power in low-data and high-data space, a series of descriptors datasets of varying sizes are also assembled to evaluate the models. In total, we have trained 62,820 models, including 50,220 models on fixed representations, 4200 models on SMILES sequences and 8400 models on molecular graphs. Based on extensive experimentation and rigorous comparison, we show that representation learning models exhibit limited performance in molecular property prediction in most datasets. Besides, multiple key elements underlying molecular property prediction can affect the evaluation results. Furthermore, we show that activity cliffs can significantly impact model prediction. Finally, we explore into potential causes why representation learning models can fail and show that dataset size is essential for representation learning models to excel.
Collapse
Affiliation(s)
- Jianyuan Deng
- Stony Brook University, Department of Biomedical Informatics, Stony Brook, NY, 11794, USA
| | - Zhibo Yang
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA
| | - Hehe Wang
- Stony Brook University, Department of Chemistry, Stony Brook, NY, 11794, USA
| | - Iwao Ojima
- Stony Brook University, Department of Chemistry, Stony Brook, NY, 11794, USA
| | - Dimitris Samaras
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA
| | - Fusheng Wang
- Stony Brook University, Department of Biomedical Informatics, Stony Brook, NY, 11794, USA.
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA.
| |
Collapse
|
6
|
Habiballah S, Reisfeld B. Adapting physiologically-based pharmacokinetic models for machine learning applications. Sci Rep 2023; 13:14934. [PMID: 37696914 PMCID: PMC10495394 DOI: 10.1038/s41598-023-42165-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 09/06/2023] [Indexed: 09/13/2023] Open
Abstract
Both machine learning and physiologically-based pharmacokinetic models are becoming essential components of the drug development process. Integrating the predictive capabilities of physiologically-based pharmacokinetic (PBPK) models within machine learning (ML) pipelines could offer significant benefits in improving the accuracy and scope of drug screening and evaluation procedures. Here, we describe the development and testing of a self-contained machine learning module capable of faithfully recapitulating summary pharmacokinetic (PK) parameters produced by a full PBPK model, given a set of input drug-specific and regimen-specific information. Because of its widespread use in characterizing the disposition of orally administered drugs, the PBPK model chosen to demonstrate the methodology was an open-source implementation of a state-of-the-art compartmental and transit model called OpenCAT. The model was tested for drug formulations spanning a large range of solubility and absorption characteristics, and was evaluated for concordance against predictions of OpenCAT and relevant experimental data. In general, the values predicted by the ML models were within 20% of those of the PBPK model across the range of drug and formulation properties. However, summary PK parameter predictions from both the ML model and full PBPK model were occasionally poor with respect to those derived from experiments, suggesting deficiencies in the underlying PBPK model.
Collapse
Affiliation(s)
- Sohaib Habiballah
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, 80523-1301, USA
| | - Brad Reisfeld
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, 80523-1301, USA.
- School of Public Health, Colorado State University, Fort Collins, CO, 80523-1612, USA.
| |
Collapse
|
7
|
Ashraf FB, Akter S, Mumu SH, Islam MU, Uddin J. Bio-activity prediction of drug candidate compounds targeting SARS-Cov-2 using machine learning approaches. PLoS One 2023; 18:e0288053. [PMID: 37669264 PMCID: PMC10479925 DOI: 10.1371/journal.pone.0288053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 06/18/2023] [Indexed: 09/07/2023] Open
Abstract
The SARS-CoV-2 3CLpro protein is one of the key therapeutic targets of interest for COVID-19 due to its critical role in viral replication, various high-quality protein crystal structures, and as a basis for computationally screening for compounds with improved inhibitory activity, bioavailability, and ADMETox properties. The ChEMBL and PubChem database contains experimental data from screening small molecules against SARS-CoV-2 3CLpro, which expands the opportunity to learn the pattern and design a computational model that can predict the potency of any drug compound against coronavirus before in-vitro and in-vivo testing. In this study, Utilizing several descriptors, we evaluated 27 machine learning classifiers. We also developed a neural network model that can correctly identify bioactive and inactive chemicals with 91% accuracy, on CheMBL data and 93% accuracy on combined data on both CheMBL and Pubchem. The F1-score for inactive and active compounds was 93% and 94%, respectively. SHAP (SHapley Additive exPlanations) on XGB classifier to find important fingerprints from the PaDEL descriptors for this task. The results indicated that the PaDEL descriptors were effective in predicting bioactivity, the proposed neural network design was efficient, and the Explanatory factor through SHAP correctly identified the important fingertips. In addition, we validated the effectiveness of our proposed model using a large dataset encompassing over 100,000 molecules. This research employed various molecular descriptors to discover the optimal one for this task. To evaluate the effectiveness of these possible medications against SARS-CoV-2, more in-vitro and in-vivo research is required.
Collapse
Affiliation(s)
- Faisal Bin Ashraf
- Department of Computer Science and Engineering, Brac University, Dhaka, Bangladesh
- Department of Computer Science and Engineering, University of California, Riverside, California, United States of America
| | - Sanjida Akter
- Department of Cell Molecular and Developmental Biology, University of California, Riverside, California, United States of America
| | - Sumona Hoque Mumu
- School of Kinesiology, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Muhammad Usama Islam
- School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Jasim Uddin
- Department of Applied Computing and Engineering, Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, Wales, United Kingdom
| |
Collapse
|
8
|
Williams AH, Zhan CG. Staying Ahead of the Game: How SARS-CoV-2 has Accelerated the Application of Machine Learning in Pandemic Management. BioDrugs 2023; 37:649-674. [PMID: 37464099 DOI: 10.1007/s40259-023-00611-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2023] [Indexed: 07/20/2023]
Abstract
In recent years, machine learning (ML) techniques have garnered considerable interest for their potential use in accelerating the rate of drug discovery. With the emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, the utilization of ML has become even more crucial in the search for effective antiviral medications. The pandemic has presented the scientific community with a unique challenge, and the rapid identification of potential treatments has become an urgent priority. Researchers have been able to accelerate the process of identifying drug candidates, repurposing existing drugs, and designing new compounds with desirable properties using machine learning in drug discovery. To train predictive models, ML techniques in drug discovery rely on the analysis of large datasets, including both experimental and clinical data. These models can be used to predict the biological activities, potential side effects, and interactions with specific target proteins of drug candidates. This strategy has proven to be an effective method for identifying potential coronavirus disease 2019 (COVID-19) and other disease treatments. This paper offers a thorough analysis of the various ML techniques implemented to combat COVID-19, including supervised and unsupervised learning, deep learning, and natural language processing. The paper discusses the impact of these techniques on pandemic drug development, including the identification of potential treatments, the understanding of the disease mechanism, and the creation of effective and safe therapeutics. The lessons learned can be applied to future outbreaks and drug discovery initiatives.
Collapse
Affiliation(s)
- Alexander H Williams
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- GSK Upper Providence, 1250 S. Collegeville Road, Collegeville, PA, 19426, USA
| | - Chang-Guo Zhan
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
| |
Collapse
|
9
|
Venkatraman V. FP-MAP: an extensive library of fingerprint-based molecular activity prediction tools. Front Chem 2023; 11:1239467. [PMID: 37649967 PMCID: PMC10462816 DOI: 10.3389/fchem.2023.1239467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023] Open
Abstract
Discovering new drugs for disease treatment is challenging, requiring a multidisciplinary effort as well as time, and resources. With a view to improving hit discovery and lead compound identification, machine learning (ML) approaches are being increasingly used in the decision-making process. Although a number of ML-based studies have been published, most studies only report fragments of the wider range of bioactivities wherein each model typically focuses on a particular disease. This study introduces FP-MAP, an extensive atlas of fingerprint-based prediction models that covers a diverse range of activities including neglected tropical diseases (caused by viral, bacterial and parasitic pathogens) as well as other targets implicated in diseases such as Alzheimer's. To arrive at the best predictive models, performance of ≈4,000 classification/regression models were evaluated on different bioactivity data sets using 12 different molecular fingerprints. The best performing models that achieved test set AUC values of 0.62-0.99 have been integrated into an easy-to-use graphical user interface that can be downloaded from https://gitlab.com/vishsoft/fpmap.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
10
|
Carneiro J, Magalhães RP, de la Oliva Roque VM, Simões M, Pratas D, Sousa SF. TargIDe: a machine-learning workflow for target identification of molecules with antibiofilm activity against Pseudomonas aeruginosa. J Comput Aided Mol Des 2023; 37:265-278. [PMID: 37085636 DOI: 10.1007/s10822-023-00505-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/12/2023] [Indexed: 04/23/2023]
Abstract
Bacterial biofilms are a source of infectious human diseases and are heavily linked to antibiotic resistance. Pseudomonas aeruginosa is a multidrug-resistant bacterium widely present and implicated in several hospital-acquired infections. Over the last years, the development of new drugs able to inhibit Pseudomonas aeruginosa by interfering with its ability to form biofilms has become a promising strategy in drug discovery. Identifying molecules able to interfere with biofilm formation is difficult, but further developing these molecules by rationally improving their activity is particularly challenging, as it requires knowledge of the specific protein target that is inhibited. This work describes the development of a machine learning multitechnique consensus workflow to predict the protein targets of molecules with confirmed inhibitory activity against biofilm formation by Pseudomonas aeruginosa. It uses a specialized database containing all the known targets implicated in biofilm formation by Pseudomonas aeruginosa. The experimentally confirmed inhibitors available on ChEMBL, together with chemical descriptors, were used as the input features for a combination of nine different classification models, yielding a consensus method to predict the most likely target of a ligand. The implemented algorithm is freely available at https://github.com/BioSIM-Research-Group/TargIDe under licence GNU General Public Licence (GPL) version 3 and can easily be improved as more data become available.
Collapse
Affiliation(s)
- João Carneiro
- Interdisciplinary Centre of Marine and Environmental Research, CIIMAR, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, Porto, 4450-208, Portugal.
| | - Rita P Magalhães
- Faculty of Medicine, Associate Laboratory i4HB-Institute for Health and Bioeconomy, University of Porto, 4200-319, Porto, Portugal
- Department of Biomedicine, Faculty of Medicine, UCIBIO-Applied Molecular Biosciences Unit, University of Porto, BioSIM, Porto, 4200-319, Portugal
| | - Victor M de la Oliva Roque
- Faculty of Medicine, Associate Laboratory i4HB-Institute for Health and Bioeconomy, University of Porto, 4200-319, Porto, Portugal
- Department of Biomedicine, Faculty of Medicine, UCIBIO-Applied Molecular Biosciences Unit, University of Porto, BioSIM, Porto, 4200-319, Portugal
| | - Manuel Simões
- Faculty of Engineering, LEPABE Laboratory for Process Engineering, Environment, Biotechnology and Energy, University of Porto, Rua Dr. Roberto Frias, s/n, Porto, 4200-465, Portugal
- Faculty of Engineering, ALiCE-Associate Laboratory in Chemical Engineering, University of Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
| | - Diogo Pratas
- Institute of Electronics and Informatics Engineering of Aveiro, IEETA, University of Aveiro, Aveiro, Portugal
- Department of Electronics, Telecommunications and Informatics, DETI, University of Aveiro, Aveiro, Portugal
- Department of Virology, DoV, University of Helsinki, Helsinki, Finland
| | - Sérgio F Sousa
- Faculty of Medicine, Associate Laboratory i4HB-Institute for Health and Bioeconomy, University of Porto, 4200-319, Porto, Portugal
- Department of Biomedicine, Faculty of Medicine, UCIBIO-Applied Molecular Biosciences Unit, University of Porto, BioSIM, Porto, 4200-319, Portugal
| |
Collapse
|
11
|
Dai M, Xiao G, Shao M, Zhang YS. The Synergy between Deep Learning and Organs-on-Chips for High-Throughput Drug Screening: A Review. BIOSENSORS 2023; 13:389. [PMID: 36979601 PMCID: PMC10046732 DOI: 10.3390/bios13030389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 02/22/2023] [Accepted: 03/07/2023] [Indexed: 06/18/2023]
Abstract
Organs-on-chips (OoCs) are miniature microfluidic systems that have arguably become a class of advanced in vitro models. Deep learning, as an emerging topic in machine learning, has the ability to extract a hidden statistical relationship from the input data. Recently, these two areas have become integrated to achieve synergy for accelerating drug screening. This review provides a brief description of the basic concepts of deep learning used in OoCs and exemplifies the successful use cases for different types of OoCs. These microfluidic chips are of potential to be assembled as highly potent human-on-chips with complex physiological or pathological functions. Finally, we discuss the future supply with perspectives and potential challenges in terms of combining OoCs and deep learning for image processing and automation designs.
Collapse
Affiliation(s)
- Manna Dai
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
- Computing and Intelligence Department, Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Gao Xiao
- College of Environment and Safety Engineering, Fuzhou University, Fuzhou 350108, China
- Department of Biomedical Engineering, Tsinghua University, Beijing 100084, China
| | - Ming Shao
- Department of Computer and Information Science, College of Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA 02747, USA
| | - Yu Shrike Zhang
- Division of Engineering in Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Cambridge, MA 02139, USA
| |
Collapse
|
12
|
Vignaux PA, Lane TR, Urbina F, Gerlach J, Puhl AC, Snyder SH, Ekins S. Validation of Acetylcholinesterase Inhibition Machine Learning Models for Multiple Species. Chem Res Toxicol 2023; 36:188-201. [PMID: 36737043 PMCID: PMC9945174 DOI: 10.1021/acs.chemrestox.2c00283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Acetylcholinesterase (AChE) is an important enzyme and target for human therapeutics, environmental safety, and global food supply. Inhibitors of this enzyme are also used for pest elimination and can be misused for suicide or chemical warfare. Adverse effects of AChE pesticides on nontarget organisms, such as fish, amphibians, and humans, have also occurred as a result of biomagnifications of these toxic compounds. We have exhaustively curated the public data for AChE inhibition data and developed machine learning classification models for seven different species. Each set of models were built using up to nine different algorithms for each species and Morgan fingerprints (ECFP6) with an activity cutoff of 1 μM. The human (4075 compounds) and eel (5459 compounds) consensus models predicted AChE inhibition activity using external test sets from literature data with 81% and 82% accuracy, respectively, while the reciprocal cross (76% and 82% percent accuracy) was not species-specific. In addition, we also created machine learning regression models for human and eel AChE inhibition to return a predicted IC50 value for a queried molecule. We did observe an improved species specificity in the regression models, where a human support vector regression model of human AChE inhibition (3652 compounds) predicted the IC50s of the human test set to a better extent than the eel regression model (4930 compounds) on the same test set, based on mean absolute percentage error (MAPE = 9.73% vs 13.4%). The predictive power of these models certainly benefits from increasing the chemical diversity of the training set, as evidenced by expanding our human classification model by incorporating data from the Tox21 library of compounds. Of the 10 compounds we tested that were predicted active by this expanded model, two showed >80% inhibition at 100 μM. This machine learning approach therefore offers the ability to rapidly score massive libraries of molecules against the models for AChE inhibition that can then be selected for future in vitro testing to identify potential toxins. It also enabled us to create a public website, MegaAChE, for single-molecule predictions of AChE inhibition using these models at megaache.collaborationspharma.com.
Collapse
Affiliation(s)
- Patricia A Vignaux
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Jacob Gerlach
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Ana C Puhl
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Scott H Snyder
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| |
Collapse
|
13
|
Urbina F, Ekins S. The Commoditization of AI for Molecule Design. ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES 2022; 2:100031. [PMID: 36211981 PMCID: PMC9541920 DOI: 10.1016/j.ailsci.2022.100031] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become "designed by AI". AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for de novo design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
14
|
Lane TR, Urbina F, Zhang X, Fye M, Gerlach J, Wright SH, Ekins S. Machine Learning Models Identify New Inhibitors for Human OATP1B1. Mol Pharm 2022; 19:4320-4332. [PMID: 36269563 PMCID: PMC9873312 DOI: 10.1021/acs.molpharmaceut.2c00662] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The uptake transporter OATP1B1 (SLC01B1) is largely localized to the sinusoidal membrane of hepatocytes and is a known victim of unwanted drug-drug interactions. Computational models are useful for identifying potential substrates and/or inhibitors of clinically relevant transporters. Our goal was to generate OATP1B1 in vitro inhibition data for [3H] estrone-3-sulfate (E3S) transport in CHO cells and use it to build machine learning models to facilitate a comparison of seven different classification models (Deep learning, Adaboosted decision trees, Bernoulli naïve bayes, k-nearest neighbors (knn), random forest, support vector classifier (SVC), logistic regression (lreg), and XGBoost (xgb)] using ECFP6 fingerprints to perform 5-fold, nested cross validation. In addition, we compared models using 3D pharmacophores, simple chemical descriptors alone or plus ECFP6, as well as ECFP4 and ECFP8 fingerprints. Several machine learning algorithms (SVC, lreg, xgb, and knn) had excellent nested cross validation statistics, particularly for accuracy, AUC, and specificity. An external test set containing 207 unique compounds not in the training set demonstrated that at every threshold SVC outperformed the other algorithms based on a rank normalized score. A prospective validation test set was chosen using prediction scores from the SVC models with ECFP fingerprints and were tested in vitro with 15 of 19 compounds (84% accuracy) predicted as active (≥20% inhibition) showed inhibition. Of these compounds, six (abamectin, asiaticoside, berbamine, doramectin, mobocertinib, and umbralisib) appear to be novel inhibitors of OATP1B1 not previously reported. These validated machine learning models can now be used to make predictions for drug-drug interactions for human OATP1B1 alongside other machine learning models for important drug transporters in our MegaTrans software.
Collapse
Affiliation(s)
- Thomas R. Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Xiaohong Zhang
- Department of Physiology, College of Medicine, University of Arizona, Tucson, AZ, 85724, USA
| | - Margret Fye
- Department of Physiology, College of Medicine, University of Arizona, Tucson, AZ, 85724, USA
| | - Jacob Gerlach
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Stephen H. Wright
- Department of Physiology, College of Medicine, University of Arizona, Tucson, AZ, 85724, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| |
Collapse
|
15
|
De León G, Fröhlich E, Fink E, Di Pizio A, Salar-Behzadi S. Premexotac: Machine learning bitterants predictor for advancing pharmaceutical development. Int J Pharm 2022; 628:122263. [DOI: 10.1016/j.ijpharm.2022.122263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/27/2022] [Accepted: 09/29/2022] [Indexed: 10/31/2022]
|
16
|
Rank L, Puhl AC, Havener TM, Anderson E, Foil DH, Zorn KM, Monakhova N, Riabova O, Hickey AJ, Makarov V, Ekins S. Multiple approaches to repurposing drugs for neuroblastoma. Bioorg Med Chem 2022; 73:117043. [PMID: 36208544 PMCID: PMC9870653 DOI: 10.1016/j.bmc.2022.117043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 09/27/2022] [Accepted: 09/28/2022] [Indexed: 01/26/2023]
Abstract
Neuroblastoma (NB) is the second leading extracranial solid tumor of early childhood with about two-thirds of cases presenting before the age of 5, and accounts for roughly 15 percent of all pediatric cancer fatalities in the United States. Treatments against NB are lacking, resulting in a low survival rate in high-risk patients. A repurposing approach using already approved or clinical stage compounds can be used for diseases for which the patient population is small, and the commercial market limited. We have used Bayesian machine learning, in vitro cell assays, and combination analysis to identify molecules with potential use for NB. We demonstrated that pyronaridine (SH-SY5Y IC50 1.70 µM, SK-N-AS IC50 3.45 µM), BAY 11-7082 (SH-SY5Y IC50 0.85 µM, SK-N-AS IC50 1.23 µM), niclosamide (SH-SY5Y IC50 0.87 µM, SK-N-AS IC50 2.33 µM) and fingolimod (SH-SY5Y IC50 4.71 µM, SK-N-AS IC50 6.11 µM) showed cytotoxicity against NB. As several of the molecules are approved drugs in the US or elsewhere, they may be repurposed more readily for NB treatment. Pyronaridine was also tested in combinations in SH-SY5Y cells and demonstrated an antagonistic effect with either etoposide or crizotinib. Whereas when crizotinib and etoposide were combined with each other they had a synergistic effect in these cells. We have also described several analogs of pyronaridine to explore the structure-activity relationship against cell lines. We describe multiple molecules demonstrating cytotoxicity against NB and the further evaluation of these molecules and combinations using other NB cells lines and in vivo models will be important in the future to assess translational potential.
Collapse
Affiliation(s)
- Laura Rank
- Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| | - Ana C Puhl
- Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA.
| | - Tammy M Havener
- UNC Catalyst for Rare Diseases, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, North Carolina, USA
| | - Edward Anderson
- UNC Catalyst for Rare Diseases, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, North Carolina, USA
| | - Daniel H Foil
- Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| | - Kimberley M Zorn
- Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| | | | - Olga Riabova
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | - Anthony J Hickey
- Research Center of Biotechnology RAS, 119071 Moscow, Russia; RTI International, Research Triangle Park, NC, USA
| | - Vadim Makarov
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA.
| |
Collapse
|
17
|
Blay V, Li X, Gerlach J, Urbina F, Ekins S. Combining DELs and machine learning for toxicology prediction. Drug Discov Today 2022; 27:103351. [PMID: 36096360 PMCID: PMC9995617 DOI: 10.1016/j.drudis.2022.103351] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/31/2022] [Accepted: 09/06/2022] [Indexed: 01/12/2023]
Abstract
DNA-encoded libraries (DELs) allow starting chemical matter to be identified in drug discovery. The volume of experimental data generated also makes DELs an attractive resource for machine learning (ML). ML allows modeling complex relationships between compounds and numerical endpoints, such as the binding to a target measured by DELs. DELs could also empower other areas of drug discovery. Here, we propose that DELs and ML could be combined to model binding to off-targets, enabling better predictive toxicology. With enough data, ML models can make accurate predictions across a vast chemical space, and they can be reused and expanded across projects. Although there are limitations, more general toxicology models could be applied earlier during drug discovery, illuminating safety liabilities at a lower cost.
Collapse
Affiliation(s)
- Vincent Blay
- Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA.
| | - Xiaoyu Li
- Department of Chemistry and State Key Laboratory of Synthetic Chemistry, The University of Hong Kong, Hong Kong Special Administrative Region
| | - Jacob Gerlach
- Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA.
| |
Collapse
|
18
|
Kong Y, Zhao X, Liu R, Yang Z, Yin H, Zhao B, Wang J, Qin B, Yan A. Integrating concept of pharmacophore with graph neural networks for chemical property prediction and interpretation. J Cheminform 2022; 14:52. [PMID: 35927691 PMCID: PMC9351086 DOI: 10.1186/s13321-022-00634-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 07/16/2022] [Indexed: 11/10/2022] Open
Abstract
Recently, graph neural networks (GNNs) have revolutionized the field of chemical property prediction and achieved state-of-the-art results on benchmark data sets. Compared with the traditional descriptor- and fingerprint-based QSAR models, GNNs can learn task related representations, which completely gets rid of the rules defined by experts. However, due to the lack of useful prior knowledge, the prediction performance and interpretability of the GNNs may be affected. In this study, we introduced a new GNN model called RG-MPNN for chemical property prediction that integrated pharmacophore information hierarchically into message-passing neural network (MPNN) architecture, specifically, in the way of pharmacophore-based reduced-graph (RG) pooling. RG-MPNN absorbed not only the information of atoms and bonds from the atom-level message-passing phase, but also the information of pharmacophores from the RG-level message-passing phase. Our experimental results on eleven benchmark and ten kinase data sets showed that our model consistently matched or outperformed other existing GNN models. Furthermore, we demonstrated that applying pharmacophore-based RG pooling to MPNN architecture can generally help GNN models improve the predictive power. The cluster analysis of RG-MPNN representations and the importance analysis of pharmacophore nodes will help chemists gain insights for hit discovery and lead optimization.
Collapse
Affiliation(s)
- Yue Kong
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.,Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Xiaoman Zhao
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Ruizi Liu
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Zhenwu Yang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Hongyan Yin
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.,Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Bowen Zhao
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Jinling Wang
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Bingjie Qin
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.
| |
Collapse
|
19
|
Li J, Chen J, Bai H, Wang H, Hao S, Ding Y, Peng B, Zhang J, Li L, Huang W. An Overview of Organs-on-Chips Based on Deep Learning. RESEARCH (WASHINGTON, D.C.) 2022; 2022:9869518. [PMID: 35136860 PMCID: PMC8795883 DOI: 10.34133/2022/9869518] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 12/08/2021] [Indexed: 12/15/2022]
Abstract
Microfluidic-based organs-on-chips (OoCs) are a rapidly developing technology in biomedical and chemical research and have emerged as one of the most advanced and promising in vitro models. The miniaturization, stimulated tissue mechanical forces, and microenvironment of OoCs offer unique properties for biomedical applications. However, the large amount of data generated by the high parallelization of OoC systems has grown far beyond the scope of manual analysis by researchers with biomedical backgrounds. Deep learning, an emerging area of research in the field of machine learning, can automatically mine the inherent characteristics and laws of "big data" and has achieved remarkable applications in computer vision, speech recognition, and natural language processing. The integration of deep learning in OoCs is an emerging field that holds enormous potential for drug development, disease modeling, and personalized medicine. This review briefly describes the basic concepts and mechanisms of microfluidics and deep learning and summarizes their successful integration. We then analyze the combination of OoCs and deep learning for image digitization, data analysis, and automation. Finally, the problems faced in current applications are discussed, and future perspectives and suggestions are provided to further strengthen this integration.
Collapse
Affiliation(s)
- Jintao Li
- Frontiers Science Center for Flexible Electronics, Xi'an Institute of Flexible Electronics (IFE) and Xi'an Institute of Biomedical Materials & Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Jie Chen
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Electronics and Information Engineering, Anhui University, Hefei 230601, China
- 38th Research Institute of China Electronics Technology Group Corporation, Hefei 230088, China
| | - Hua Bai
- Frontiers Science Center for Flexible Electronics, Xi'an Institute of Flexible Electronics (IFE) and Xi'an Institute of Biomedical Materials & Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Haiwei Wang
- Frontiers Science Center for Flexible Electronics, Xi'an Institute of Flexible Electronics (IFE) and Xi'an Institute of Biomedical Materials & Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shiping Hao
- Frontiers Science Center for Flexible Electronics, Xi'an Institute of Flexible Electronics (IFE) and Xi'an Institute of Biomedical Materials & Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yang Ding
- Frontiers Science Center for Flexible Electronics, Xi'an Institute of Flexible Electronics (IFE) and Xi'an Institute of Biomedical Materials & Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Bo Peng
- Frontiers Science Center for Flexible Electronics, Xi'an Institute of Flexible Electronics (IFE) and Xi'an Institute of Biomedical Materials & Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Jing Zhang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Lin Li
- Frontiers Science Center for Flexible Electronics, Xi'an Institute of Flexible Electronics (IFE) and Xi'an Institute of Biomedical Materials & Engineering, Northwestern Polytechnical University, Xi'an 710072, China
- Key Laboratory of Flexible Electronics (KLOFE) and Institute of Advanced Materials (IAM) Nanjing Tech University (NanjingTech), Nanjing 211800, China
| | - Wei Huang
- Frontiers Science Center for Flexible Electronics, Xi'an Institute of Flexible Electronics (IFE) and Xi'an Institute of Biomedical Materials & Engineering, Northwestern Polytechnical University, Xi'an 710072, China
- Key Laboratory of Flexible Electronics (KLOFE) and Institute of Advanced Materials (IAM) Nanjing Tech University (NanjingTech), Nanjing 211800, China
| |
Collapse
|
20
|
Gawriljuk VO, Foil DH, Puhl AC, Zorn KM, Lane TR, Riabova O, Makarov V, Godoy AS, Oliva G, Ekins S. Development of Machine Learning Models and the Discovery of a New Antiviral Compound against Yellow Fever Virus. J Chem Inf Model 2021; 61:3804-3813. [PMID: 34286575 DOI: 10.1021/acs.jcim.1c00460] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Yellow fever (YF) is an acute viral hemorrhagic disease transmitted by infected mosquitoes. Large epidemics of YF occur when the virus is introduced into heavily populated areas with high mosquito density and low vaccination coverage. The lack of a specific small molecule drug treatment against YF as well as for homologous infections, such as zika and dengue, highlights the importance of these flaviviruses as a public health concern. With the advancement in computer hardware and bioactivity data availability, new tools based on machine learning methods have been introduced into drug discovery, as a means to utilize the growing high throughput screening (HTS) data generated to reduce costs and increase the speed of drug development. The use of predictive machine learning models using previously published data from HTS campaigns or data available in public databases, can enable the selection of compounds with desirable bioactivity and absorption, distribution, metabolism, and excretion profiles. In this study, we have collated cell-based assay data for yellow fever virus from the literature and public databases. The data were used to build predictive models with several machine learning methods that could prioritize compounds for in vitro testing. Five molecules were prioritized and tested in vitro from which we have identified a new pyrazolesulfonamide derivative with EC50 3.2 μM and CC50 24 μM, which represents a new scaffold suitable for hit-to-lead optimization that can expand the available drug discovery candidates for YF.
Collapse
Affiliation(s)
- Victor O Gawriljuk
- São Carlos Institute of Physics, University of São Paulo, Av. João Dagnone, 1100 - Santa Angelina, São Carlos, São Paulo 13563-120, Brazil
| | - Daniel H Foil
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Ana C Puhl
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Kimberley M Zorn
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Olga Riabova
- Research Center of Biotechnology RAS, Leninsky Prospekt 33-2, 119071 Moscow, Russia
| | - Vadim Makarov
- Research Center of Biotechnology RAS, Leninsky Prospekt 33-2, 119071 Moscow, Russia
| | - Andre S Godoy
- São Carlos Institute of Physics, University of São Paulo, Av. João Dagnone, 1100 - Santa Angelina, São Carlos, São Paulo 13563-120, Brazil
| | - Glaucius Oliva
- São Carlos Institute of Physics, University of São Paulo, Av. João Dagnone, 1100 - Santa Angelina, São Carlos, São Paulo 13563-120, Brazil
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| |
Collapse
|
21
|
Batra K, Zorn KM, Foil DH, Minerali E, Gawriljuk VO, Lane TR, Ekins S. Quantum Machine Learning Algorithms for Drug Discovery Applications. J Chem Inf Model 2021; 61:2641-2647. [PMID: 34032436 PMCID: PMC8254374 DOI: 10.1021/acs.jcim.1c00166] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The growing quantity of public and private data sets focused on small molecules screened against biological targets or whole organisms provides a wealth of drug discovery relevant data. This is matched by the availability of machine learning algorithms such as Support Vector Machines (SVM) and Deep Neural Networks (DNN) that are computationally expensive to perform on very large data sets with thousands of molecular descriptors. Quantum computer (QC) algorithms have been proposed to offer an approach to accelerate quantum machine learning over classical computer (CC) algorithms, however with significant limitations. In the case of cheminformatics, which is widely used in drug discovery, one of the challenges to overcome is the need for compression of large numbers of molecular descriptors for use on a QC. Here, we show how to achieve compression with data sets using hundreds of molecules (SARS-CoV-2) to hundreds of thousands of molecules (whole cell screening data sets for plague and M. tuberculosis) with SVM and the data reuploading classifier (a DNN equivalent algorithm) on a QC benchmarked against CC and hybrid approaches. This study illustrates the steps needed in order to be "quantum computer ready" in order to apply quantum computing to drug discovery and to provide the foundation on which to build this field.
Collapse
Affiliation(s)
- Kushal Batra
- Computer Science, NC State University, Raleigh, NC 27606, USA
| | - Kimberley M. Zorn
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Daniel H. Foil
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Eni Minerali
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Victor O. Gawriljuk
- São Carlos Institute of Physics, University of São Paulo, Av. João Dagnone, 1100 - Santa Angelina, São Carlos - SP, 13563-120, Brazil
| | - Thomas R. Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
22
|
Moreira-Filho JT, Silva AC, Dantas RF, Gomes BF, Souza Neto LR, Brandao-Neto J, Owens RJ, Furnham N, Neves BJ, Silva-Junior FP, Andrade CH. Schistosomiasis Drug Discovery in the Era of Automation and Artificial Intelligence. Front Immunol 2021; 12:642383. [PMID: 34135888 PMCID: PMC8203334 DOI: 10.3389/fimmu.2021.642383] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/30/2021] [Indexed: 12/20/2022] Open
Abstract
Schistosomiasis is a parasitic disease caused by trematode worms of the genus Schistosoma and affects over 200 million people worldwide. The control and treatment of this neglected tropical disease is based on a single drug, praziquantel, which raises concerns about the development of drug resistance. This, and the lack of efficacy of praziquantel against juvenile worms, highlights the urgency for new antischistosomal therapies. In this review we focus on innovative approaches to the identification of antischistosomal drug candidates, including the use of automated assays, fragment-based screening, computer-aided and artificial intelligence-based computational methods. We highlight the current developments that may contribute to optimizing research outputs and lead to more effective drugs for this highly prevalent disease, in a more cost-effective drug discovery endeavor.
Collapse
Affiliation(s)
- José T. Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Arthur C. Silva
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Rafael F. Dantas
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Barbara F. Gomes
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Lauro R. Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Jose Brandao-Neto
- Diamond Light Source Ltd., Didcot, United Kingdom
- Research Complex at Harwell, Didcot, United Kingdom
| | - Raymond J. Owens
- The Rosalind Franklin Institute, Harwell, United Kingdom
- Division of Structural Biology, The Wellcome Centre for Human Genetic, University of Oxford, Oxford, United Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Bruno J. Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Floriano P. Silva-Junior
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Carolina H. Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| |
Collapse
|