1
|
Olubamiwa AO, Liao TJ, Zhao J, Dehanne P, Noban C, Angin Y, Barberan O, Chen M. Drug interaction with UDP-Glucuronosyltransferase (UGT) enzymes is a predictor of drug-induced liver injury. Hepatology 2024:01515467-990000000-00962. [PMID: 39024247 DOI: 10.1097/hep.0000000000001007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 06/24/2024] [Indexed: 07/20/2024]
Abstract
BACKGROUND AND AIMS DILI frequently contributes to the attrition of new drug candidates and is a common cause for the withdrawal of approved drugs from the market. Although some noncytochrome P450 (non-CYP) metabolism enzymes have been implicated in DILI development, their association with DILI outcomes has not been systematically evaluated. APPROACH AND RESULTS In this study, we analyzed a large data set comprising 317 drugs and their interactions in vitro with 42 non-CYP enzymes as substrates, inducers, and/or inhibitors retrieved from historical regulatory documents using multivariate logistic regression. We examined how these in vitro drug-enzyme interactions are correlated with the drugs' potential for DILI concern, as classified in the Liver Toxicity Knowledge Base database. Our study revealed that drugs that inhibit non-CYP enzymes are significantly associated with high DILI concern. Particularly, interaction with UDP-glucuronosyltransferases (UGT) enzymes is an important predictor of DILI outcomes. Further analysis indicated that only pure UGT inhibitors and dual substrate inhibitors, but not pure UGT substrates, are significantly associated with high DILI concern. CONCLUSIONS Drug interactions with UGT enzymes may independently predict DILI, and their combined use with the rule-of-two model further improves overall predictive performance. These findings could expand the currently available tools for assessing the potential for DILI in humans.
Collapse
Affiliation(s)
- AyoOluwa O Olubamiwa
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research (NCTR), U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Tsung-Jen Liao
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research (NCTR), U.S. Food and Drug Administration, Jefferson, Arkansas, USA
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Jinwen Zhao
- Department of Information Science, University of Arkansas at Little Rock, Arkansas, USA
| | - Patrice Dehanne
- Life Sciences, Elsevier B.V Radarweg, Amsterdam, Netherlands
| | - Catherine Noban
- Life Sciences, Elsevier B.V Radarweg, Amsterdam, Netherlands
| | - Yeliz Angin
- Life Sciences, Elsevier B.V Radarweg, Amsterdam, Netherlands
| | | | - Minjun Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research (NCTR), U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| |
Collapse
|
2
|
Christodoulou A, Katsarou MS, Emmanouil C, Gavrielatos M, Georgiou D, Tsolakou A, Papasavva M, Economou V, Nanou V, Nikolopoulos I, Daganou M, Argyraki A, Stefanidis E, Metaxas G, Panagiotou E, Michalopoulos I, Drakoulis N. A Machine Learning-Based Web Tool for the Severity Prediction of COVID-19. BIOTECH 2024; 13:22. [PMID: 39051337 PMCID: PMC11270362 DOI: 10.3390/biotech13030022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 06/13/2024] [Accepted: 06/20/2024] [Indexed: 07/27/2024] Open
Abstract
Predictive tools provide a unique opportunity to explain the observed differences in outcome between patients of the COVID-19 pandemic. The aim of this study was to associate individual demographic and clinical characteristics with disease severity in COVID-19 patients and to highlight the importance of machine learning (ML) in disease prognosis. The study enrolled 344 unvaccinated patients with confirmed SARS-CoV-2 infection. Data collected by integrating questionnaires and medical records were imported into various classification machine learning algorithms, and the algorithm and the hyperparameters with the greatest predictive ability were selected for use in a disease outcome prediction web tool. Of 111 independent features, age, sex, hypertension, obesity, and cancer comorbidity were found to be associated with severe COVID-19. Our prognostic tool can contribute to a successful therapeutic approach via personalized treatment. Although at the present time vaccination is not considered mandatory, this algorithm could encourage vulnerable groups to be vaccinated.
Collapse
Affiliation(s)
- Avgi Christodoulou
- Research Group of Clinical Pharmacology and Pharmacogenomics Faculty of Pharmacy, School oh Health Sciences, National and Kapodistrian University of Athens, 15771 Athens, Greece; (A.C.); (M.-S.K.); (A.T.); (V.E.); (N.D.)
- Sotiria Thoracic Diseases Hospital of Athens, 11527 Athens, Greece; (V.N.); (I.N.); (M.D.); (A.A.); (E.S.); (G.M.); (E.P.)
| | - Martha-Spyridoula Katsarou
- Research Group of Clinical Pharmacology and Pharmacogenomics Faculty of Pharmacy, School oh Health Sciences, National and Kapodistrian University of Athens, 15771 Athens, Greece; (A.C.); (M.-S.K.); (A.T.); (V.E.); (N.D.)
| | - Christina Emmanouil
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece; (C.E.); (M.G.); (D.G.)
- Department of Biology, National and Kapodistrian University of Athens, 15772 Athens, Greece
- Institute for Bioinnovation, Biomedical Sciences Research Center ‘Alexander Fleming’, 16672 Vari, Greece
| | - Marios Gavrielatos
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece; (C.E.); (M.G.); (D.G.)
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 16122 Athens, Greece
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Dimitrios Georgiou
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece; (C.E.); (M.G.); (D.G.)
- School of Electrical and Computer Engineering, National and Technical University of Athens, 15773 Athens, Greece
| | - Annia Tsolakou
- Research Group of Clinical Pharmacology and Pharmacogenomics Faculty of Pharmacy, School oh Health Sciences, National and Kapodistrian University of Athens, 15771 Athens, Greece; (A.C.); (M.-S.K.); (A.T.); (V.E.); (N.D.)
| | - Maria Papasavva
- Department of Pharmacy, School of Health Sciences, Frederick University, 1036 Nicosia, Cyprus;
| | - Vasiliki Economou
- Research Group of Clinical Pharmacology and Pharmacogenomics Faculty of Pharmacy, School oh Health Sciences, National and Kapodistrian University of Athens, 15771 Athens, Greece; (A.C.); (M.-S.K.); (A.T.); (V.E.); (N.D.)
| | - Vasiliki Nanou
- Sotiria Thoracic Diseases Hospital of Athens, 11527 Athens, Greece; (V.N.); (I.N.); (M.D.); (A.A.); (E.S.); (G.M.); (E.P.)
| | - Ioannis Nikolopoulos
- Sotiria Thoracic Diseases Hospital of Athens, 11527 Athens, Greece; (V.N.); (I.N.); (M.D.); (A.A.); (E.S.); (G.M.); (E.P.)
| | - Maria Daganou
- Sotiria Thoracic Diseases Hospital of Athens, 11527 Athens, Greece; (V.N.); (I.N.); (M.D.); (A.A.); (E.S.); (G.M.); (E.P.)
| | - Aikaterini Argyraki
- Sotiria Thoracic Diseases Hospital of Athens, 11527 Athens, Greece; (V.N.); (I.N.); (M.D.); (A.A.); (E.S.); (G.M.); (E.P.)
| | - Evaggelos Stefanidis
- Sotiria Thoracic Diseases Hospital of Athens, 11527 Athens, Greece; (V.N.); (I.N.); (M.D.); (A.A.); (E.S.); (G.M.); (E.P.)
| | - Gerasimos Metaxas
- Sotiria Thoracic Diseases Hospital of Athens, 11527 Athens, Greece; (V.N.); (I.N.); (M.D.); (A.A.); (E.S.); (G.M.); (E.P.)
| | - Emmanouil Panagiotou
- Sotiria Thoracic Diseases Hospital of Athens, 11527 Athens, Greece; (V.N.); (I.N.); (M.D.); (A.A.); (E.S.); (G.M.); (E.P.)
| | - Ioannis Michalopoulos
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece; (C.E.); (M.G.); (D.G.)
| | - Nikolaos Drakoulis
- Research Group of Clinical Pharmacology and Pharmacogenomics Faculty of Pharmacy, School oh Health Sciences, National and Kapodistrian University of Athens, 15771 Athens, Greece; (A.C.); (M.-S.K.); (A.T.); (V.E.); (N.D.)
| |
Collapse
|
3
|
Alhussaini AJ, Steele JD, Jawli A, Nabi G. Radiomics Machine Learning Analysis of Clear Cell Renal Cell Carcinoma for Tumour Grade Prediction Based on Intra-Tumoural Sub-Region Heterogeneity. Cancers (Basel) 2024; 16:1454. [PMID: 38672536 PMCID: PMC11048006 DOI: 10.3390/cancers16081454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/22/2024] [Accepted: 04/03/2024] [Indexed: 04/28/2024] Open
Abstract
BACKGROUND Renal cancers are among the top ten causes of cancer-specific mortality, of which the ccRCC subtype is responsible for most cases. The grading of ccRCC is important in determining tumour aggressiveness and clinical management. OBJECTIVES The objectives of this research were to predict the WHO/ISUP grade of ccRCC pre-operatively and characterise the heterogeneity of tumour sub-regions using radiomics and ML models, including comparison with pre-operative biopsy-determined grading in a sub-group. METHODS Data were obtained from multiple institutions across two countries, including 391 patients with pathologically proven ccRCC. For analysis, the data were separated into four cohorts. Cohorts 1 and 2 included data from the respective institutions from the two countries, cohort 3 was the combined data from both cohort 1 and 2, and cohort 4 was a subset of cohort 1, for which both the biopsy and subsequent histology from resection (partial or total nephrectomy) were available. 3D image segmentation was carried out to derive a voxel of interest (VOI) mask. Radiomics features were then extracted from the contrast-enhanced images, and the data were normalised. The Pearson correlation coefficient and the XGBoost model were used to reduce the dimensionality of the features. Thereafter, 11 ML algorithms were implemented for the purpose of predicting the ccRCC grade and characterising the heterogeneity of sub-regions in the tumours. RESULTS For cohort 1, the 50% tumour core and 25% tumour periphery exhibited the best performance, with an average AUC of 77.9% and 78.6%, respectively. The 50% tumour core presented the highest performance in cohorts 2 and 3, with average AUC values of 87.6% and 76.9%, respectively. With the 25% periphery, cohort 4 showed AUC values of 95.0% and 80.0% for grade prediction when using internal and external validation, respectively, while biopsy histology had an AUC of 31.0% for the classification with the final grade of resection histology as a reference standard. The CatBoost classifier was the best for each of the four cohorts with an average AUC of 80.0%, 86.5%, 77.0% and 90.3% for cohorts 1, 2, 3 and 4 respectively. CONCLUSIONS Radiomics signatures combined with ML have the potential to predict the WHO/ISUP grade of ccRCC with superior performance, when compared to pre-operative biopsy. Moreover, tumour sub-regions contain useful information that should be analysed independently when determining the tumour grade. Therefore, it is possible to distinguish the grade of ccRCC pre-operatively to improve patient care and management.
Collapse
Affiliation(s)
- Abeer J. Alhussaini
- Division of Imaging Sciences and Technology, School of Medicine, Ninewells Hospital, University of Dundee, Dundee DD1 9SY, UK
- Department of Clinical Radiology, Al-Amiri Hospital, Ministry of Health, Sulaibikhat 1300, Kuwait
| | - J. Douglas Steele
- Division of Imaging Sciences and Technology, School of Medicine, Ninewells Hospital, University of Dundee, Dundee DD1 9SY, UK
| | - Adel Jawli
- Division of Imaging Sciences and Technology, School of Medicine, Ninewells Hospital, University of Dundee, Dundee DD1 9SY, UK
- Department of Clinical Radiology, Sheikh Jaber Al-Ahmad Al-Sabah Hospital, Ministry of Health, Sulaibikhat 1300, Kuwait
| | - Ghulam Nabi
- Division of Imaging Sciences and Technology, School of Medicine, Ninewells Hospital, University of Dundee, Dundee DD1 9SY, UK
| |
Collapse
|
4
|
Grant NA, Donkor GY, Sontz JT, Soto W, Waters CM. Deployment of a Vibrio cholerae ordered transposon mutant library in a quorum-competent genetic background. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.31.564941. [PMID: 37961142 PMCID: PMC10634969 DOI: 10.1101/2023.10.31.564941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Vibrio cholerae, the causative agent of cholera, has sparked seven pandemics in recent centuries, with the current one being the most prolonged. V. cholerae's pathogenesis hinges on its ability to switch between low and high cell density gene regulatory states, enabling transmission between host and the environment. Previously, a transposon mutant library for V. cholerae was created to support investigations aimed toward uncovering the genetic determinants of its pathogenesis. However, subsequent sequencing uncovered a mutation in the gene luxO of the parent strain, rendering mutants unable to exhibit high cell density behaviors. In this study, we used chitin-independent natural transformation to move transposon insertions from these low cell density mutants into a wildtype genomic background. Library transfer was aided by a novel gDNA extraction we developed using thymol, which also showed high lysis-specificity for Vibrio. The resulting Grant Library comprises 3,102 unique transposon mutants, covering 79.8% of V. cholerae's open reading frames. Whole genome sequencing of randomly selected mutants demonstrates 100% precision in transposon transfer to cognate genomic positions of the recipient strain. Notably, in no instance did the luxO mutation transfer into the wildtype background. Our research uncovered density-dependent epistasis in growth on inosine, an immunomodulatory metabolite secreted by gut bacteria that is implicated in enhancing gut barrier functions. Additionally, Grant Library mutants retain the plasmid that enables rapid, scarless genomic editing. In summary, the Grant Library reintroduces organismal relevant genetic contexts absent in the low cell density locked library equivalent.
Collapse
Affiliation(s)
- Nkrumah A. Grant
- Department of Microbiology, University of Illinois Urbana-Champaign, Urbana, IL
- Department of Microbiology, Genetics, and Immunology, Michigan State University, East Lansing MI
- BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, MI
| | | | - Jordan T. Sontz
- MSU College of Osteopathic Medicine, Michigan State University, East Lansing, MI
| | - William Soto
- Department of Biology, College of William and Mary, Williamsburg, VA
| | - Christopher M. Waters
- Department of Microbiology, Genetics, and Immunology, Michigan State University, East Lansing MI
- BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, MI
- MSU College of Osteopathic Medicine, Michigan State University, East Lansing, MI
| |
Collapse
|
5
|
Miao Y, Ma H, Huang J. Recent Advances in Toxicity Prediction: Applications of Deep Graph Learning. Chem Res Toxicol 2023; 36:1206-1226. [PMID: 37562046 DOI: 10.1021/acs.chemrestox.2c00384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
The development of new drugs is time-consuming and expensive, and as such, accurately predicting the potential toxicity of a drug candidate is crucial in ensuring its safety and efficacy. Recently, deep graph learning has become prevalent in this field due to its computational power and cost efficiency. Many novel deep graph learning methods aid toxicity prediction and further prompt drug development. This review aims to connect fundamental knowledge with burgeoning deep graph learning methods. We first summarize the essential components of deep graph learning models for toxicity prediction, including molecular descriptors, molecular representations, evaluation metrics, validation methods, and data sets. Furthermore, based on various graph-related representations of molecules, we introduce several representative studies and methods for toxicity prediction from the perspective of GNN architectures and graph pretrained models. Compared to other types of models, deep graph models not only advance in higher accuracy and efficiency but also provide more intuitive insights, which is significant in the development of model interpretation and generalization ability. The graph pretrained models are emerging as they can extract prominent features from large-scale unlabeled molecular graph data and improve the performance of downstream toxicity prediction tasks. We hope this survey can serve as a handbook for individuals interested in exploring deep graph learning for toxicity prediction.
Collapse
Affiliation(s)
- Yuwei Miao
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Hehuan Ma
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Junzhou Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| |
Collapse
|
6
|
Sinha S, Dong T, Dimagli A, Vohra HA, Holmes C, Benedetto U, Angelini GD. Comparison of machine learning techniques in prediction of mortality following cardiac surgery: analysis of over 220 000 patients from a large national database. Eur J Cardiothorac Surg 2023; 63:ezad183. [PMID: 37154705 PMCID: PMC10275911 DOI: 10.1093/ejcts/ezad183] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 04/19/2023] [Accepted: 05/05/2023] [Indexed: 05/10/2023] Open
Abstract
OBJECTIVES To perform a systematic comparison of in-hospital mortality risk prediction post-cardiac surgery, between the predominant scoring system-European System for Cardiac Operative Risk Evaluation (EuroSCORE) II, logistic regression (LR) retrained on the same variables and alternative machine learning techniques (ML)-random forest (RF), neural networks (NN), XGBoost and weighted support vector machine. METHODS Retrospective analyses of prospectively routinely collected data on adult patients undergoing cardiac surgery in the UK from January 2012 to March 2019. Data were temporally split 70:30 into training and validation subsets. Mortality prediction models were created using the 18 variables of EuroSCORE II. Comparisons of discrimination, calibration and clinical utility were then conducted. Changes in model performance, variable-importance over time and hospital/operation-based model performance were also reviewed. RESULTS Of the 227 087 adults who underwent cardiac surgery during the study period, there were 6258 deaths (2.76%). In the testing cohort, there was an improvement in discrimination [XGBoost (95% confidence interval (CI) area under the receiver operator curve (AUC), 0.834-0.834, F1 score, 0.276-0.280) and RF (95% CI AUC, 0.833-0.834, F1, 0.277-0.281)] compared with EuroSCORE II (95% CI AUC, 0.817-0.818, F1, 0.243-0.245). There was no significant improvement in calibration with ML and retrained-LR compared to EuroSCORE II. However, EuroSCORE II overestimated risk across all deciles of risk and over time. The calibration drift was lowest in NN, XGBoost and RF compared with EuroSCORE II. Decision curve analysis showed XGBoost and RF to have greater net benefit than EuroSCORE II. CONCLUSIONS ML techniques showed some statistical improvements over retrained-LR and EuroSCORE II. The clinical impact of this improvement is modest at present. However the incorporation of additional risk factors in future studies may improve upon these findings and warrants further study.
Collapse
Affiliation(s)
- Shubhra Sinha
- Division of Cardiac Surgery, Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, UK
| | - Tim Dong
- Division of Cardiac Surgery, Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, UK
| | - Arnaldo Dimagli
- Division of Cardiac Surgery, Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, UK
| | - Hunaid A Vohra
- Division of Cardiac Surgery, Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, UK
| | - Chris Holmes
- Alan Turing Institute, London, UK
- Department of Statistics, University of Oxford, Oxford, UK
| | - Umberto Benedetto
- Division of Cardiac Surgery, Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, UK
| | - Gianni D Angelini
- Division of Cardiac Surgery, Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, UK
| |
Collapse
|
7
|
Meyer T, Ramirez C, Tamasi MJ, Gormley AJ. A User's Guide to Machine Learning for Polymeric Biomaterials. ACS POLYMERS AU 2023; 3:141-157. [PMID: 37065715 PMCID: PMC10103193 DOI: 10.1021/acspolymersau.2c00037] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/27/2022] [Accepted: 10/27/2022] [Indexed: 11/18/2022]
Abstract
The development of novel biomaterials is a challenging process, complicated by a design space with high dimensionality. Requirements for performance in the complex biological environment lead to difficult a priori rational design choices and time-consuming empirical trial-and-error experimentation. Modern data science practices, especially artificial intelligence (AI)/machine learning (ML), offer the promise to help accelerate the identification and testing of next-generation biomaterials. However, it can be a daunting task for biomaterial scientists unfamiliar with modern ML techniques to begin incorporating these useful tools into their development pipeline. This Perspective lays the foundation for a basic understanding of ML while providing a step-by-step guide to new users on how to begin implementing these techniques. A tutorial Python script has been developed walking users through the application of an ML pipeline using data from a real biomaterial design challenge based on group's research. This tutorial provides an opportunity for readers to see and experiment with ML and its syntax in Python. The Google Colab notebook can be easily accessed and copied from the following URL: www.gormleylab.com/MLcolab.
Collapse
Affiliation(s)
- Travis
A. Meyer
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| | - Cesar Ramirez
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| | - Matthew J. Tamasi
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| | - Adam J. Gormley
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| |
Collapse
|
8
|
Schofield LN, Siegel RB, Loffland HL. Modeling climate‐driven range shifts in populations of two bird species limited by habitat independent of climate. Ecosphere 2023. [DOI: 10.1002/ecs2.4408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023] Open
|
9
|
PToPI: A Comprehensive Review, Analysis, and Knowledge Representation of Binary Classification Performance Measures/Metrics. SN COMPUTER SCIENCE 2023; 4:13. [PMID: 36267467 PMCID: PMC9569243 DOI: 10.1007/s42979-022-01409-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 09/13/2022] [Indexed: 11/06/2022]
Abstract
Although few performance evaluation instruments have been used conventionally in different machine learning-based classification problem domains, there are numerous ones defined in the literature. This study reviews and describes performance instruments via formally defined novel concepts and clarifies the terminology. The study first highlights the issues in performance evaluation via a survey of 78 mobile-malware classification studies and reviews terminology. Based on three research questions, it proposes novel concepts to identify characteristics, similarities, and differences of instruments that are categorized into 'performance measures' and 'performance metrics' in the classification context for the first time. The concepts reflecting the intrinsic properties of instruments such as canonical form, geometry, duality, complementation, dependency, and leveling, aim to reveal similarities and differences of numerous instruments, such as redundancy and ground-truth versus prediction focuses. As an application of knowledge representation, we introduced a new exploratory table called PToPI (Periodic Table of Performance Instruments) for 29 measures and 28 metrics (69 instruments including variant and parametric ones). Visualizing proposed concepts, PToPI provides a new relational structure for the instruments including graphical, probabilistic, and entropic ones to see their properties and dependencies all in one place. Applications of the exploratory table in six examples from different domains in the literature have shown that PToPI aids overall instrument analysis and selection of the proper performance metrics according to the specific requirements of a classification problem. We expect that the proposed concepts and PToPI will help researchers comprehend and use the instruments and follow a systematic approach to classification performance evaluation and publication.
Collapse
|
10
|
Smorchkova AK, Khoruzhaya AN, Kremneva EI, Petryaikin AV. [Machine learning technologies in CT-based diagnostics and classification of intracranial hemorrhages]. ZHURNAL VOPROSY NEIROKHIRURGII IMENI N. N. BURDENKO 2023; 87:85-91. [PMID: 37011333 DOI: 10.17116/neiro20238702185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
Abstract
This review discusses pooled experience of creation, implementation and effectiveness of machine learning technologies in CT-based diagnosis of intracranial hemorrhages. The authors analyzed 21 original articles between 2015 and 2022 using the following keywords: «intracranial hemorrhage», «machine learning», «deep learning», «artificial intelligence». The review contains general data on basic concepts of machine learning and also considers in more detail such aspects as technical characteristics of data sets used for creation of AI algorithms for certain type of clinical task, their possible impact on effectiveness and clinical experience.
Collapse
Affiliation(s)
- A K Smorchkova
- Moscow Research Practical Clinical Center for Diagnostics and Telemedicine Technologies, Moscow, Russia
| | - A N Khoruzhaya
- Moscow Research Practical Clinical Center for Diagnostics and Telemedicine Technologies, Moscow, Russia
| | - E I Kremneva
- Moscow Research Practical Clinical Center for Diagnostics and Telemedicine Technologies, Moscow, Russia
- Neurology Research Center, Moscow, Russia
| | - A V Petryaikin
- Moscow Research Practical Clinical Center for Diagnostics and Telemedicine Technologies, Moscow, Russia
| |
Collapse
|
11
|
Comparative Analysis of Performance Metrics for Machine Learning Classifiers with a Focus on Alzheimer's Disease Data. ACTA INFORMATICA PRAGENSIA 2022. [DOI: 10.18267/j.aip.198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
12
|
Yabuki A, Ikeno H, Dannoura M. A root auto tracing and analysis (
ARATA
): An automatic analysis software for detecting fine roots in images from flatbed optical scanners. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Arata Yabuki
- Laboratory of Forest Utilization Graduate School of Agriculture, Kyoto University Kyoto Japan
| | - Hidetoshi Ikeno
- Faculty of Informatics The University of Fukuchiyama Kyoto Japan
- School of Human Science and Environment University of Hyogo Hyogo Japan
| | - Masako Dannoura
- Laboratory of Forest Utilization Graduate School of Agriculture, Kyoto University Kyoto Japan
| |
Collapse
|
13
|
Alcantara L, Schenkel F, Lynch C, Oliveira Junior G, Baes C, Tulpan D. Machine learning classification of breeding protocol descriptions from Canadian Holsteins. J Dairy Sci 2022; 105:8177-8188. [DOI: 10.3168/jds.2021-21663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 06/08/2022] [Indexed: 11/19/2022]
|
14
|
Gonzalez-Martinez A, Pagán J, Sanz-García A, García-Azorín D, Rodriguez Vico JS, Jaimes A, Gómez García A, Díaz de Terán J, González-García N, Quintas S, Belascoaín R, Casas Limón J, Latorre G, Calle de Miguel C, Sierra Á, Guerrero-Peral ÁL, Trevino-Peinado C, Gago-Veiga AB. Machine-learning based approach to predict anti-CGRP response in patients with migraine: multicenter Spanish study. Eur J Neurol 2022; 29:3102-3111. [PMID: 35726393 DOI: 10.1111/ene.15458] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 06/13/2022] [Indexed: 11/26/2022]
Abstract
BACKGROUND To date, several variables have been associated with anti-CGRP receptor or ligand-antibody response with disparate results. Our objective is to determine whether machine learning (ML)-based models can predict 6, 9 and 12 months response to anti-CGRP receptor or ligand therapies among migraine patients. METHODS We performed a multicenter analysis of a prospectively collected data cohort of patients with migraine receiving anti-CGRP therapies. Demographic and clinical variables were collected. Response rate defined in the 30% to 50% range -or at least 30%-, in the 50% to 75% range -or at least 50%-, and response rate over 75% reduction in the number of headache days per month at 6, 9 and 12 months. A sequential forward feature selector was used for variable selection and ML-based predictive models response to anti-CGRP therapies at 6, 9 and 12 months, with models' accuracy not less than 70%, were generated. RESULTS A total of 712 patients were included, 93% women, aged 48 years (SD=11.7). Eighty-three percent had chronic migraine. ML models using headache days/month, migraine days/month and HIT-6 variables yielded predictions with a F1 score range of 0.70-0.97 and AUC (area under the receiver operating curve) score range of 0.87-0.98. SHAP (SHapley Additive exPlanations) summary plots and dependence plots were generated to evaluate the relevance of the factors associated with the prediction of the above-mentioned response rates. CONCLUSIONS According to our study, ML models can predict anti-CGRP response at 6, 9 and 12 months. This study provides a predictive tool to be used in a real-world setting.
Collapse
Affiliation(s)
- Alicia Gonzalez-Martinez
- Headache Unit, Neurology Department, Hospital Universitario de la Princesa & Instituto de Investigación Sanitaria La Princesa, Madrid, Spain
| | - Josué Pagán
- Universidad Politécnica de Madrid and Center for Computational Simulation of Universidad Politécnica de Madrid, Madrid, Spain
| | - Ancor Sanz-García
- Headache Unit, Neurology Department, Hospital Universitario de la Princesa & Instituto de Investigación Sanitaria La Princesa, Madrid, Spain
| | - David García-Azorín
- Headache Unit, Neurology Department, Department of Medicine, University of Valladolid, Hospital Clínico Universitario de Valladolid, Valladolid, Spain
| | | | - Alex Jaimes
- Headache Unit, Neurology Department, Fundación Jiménez Díaz, Madrid, Spain
| | | | - Javier Díaz de Terán
- Headache Unit, Neurology Department, Hospital Universitario La Paz, Madrid, Spain
| | - Nuria González-García
- Headache Unit, Neurology Department, Hospital Universitario Clínico San Carlos, Madrid, Spain
| | - Sonia Quintas
- Headache Unit, Neurology Department, Hospital Universitario de la Princesa & Instituto de Investigación Sanitaria La Princesa, Madrid, Spain
| | - Rocio Belascoaín
- Headache Unit, Neurology Department, Hospital Universitario de la Princesa & Instituto de Investigación Sanitaria La Princesa, Madrid, Spain
| | - Javier Casas Limón
- Headache Unit Neurology Department, Hospital Universitario Fundación de Alcorcón, Alcorcón, Spain
| | - Germán Latorre
- Headache Unit, Neurology Department, Hospital Universitario de Fuenlabrada, Madrid, Spain
| | - Carlos Calle de Miguel
- Headache Unit, Neurology Department, Hospital Universitario de Fuenlabrada, Madrid, Spain
| | - Álvaro Sierra
- Headache Unit, Neurology Department, Department of Medicine, University of Valladolid, Hospital Clínico Universitario de Valladolid, Valladolid, Spain
| | - Ángel Luis Guerrero-Peral
- Headache Unit, Neurology Department, Department of Medicine, University of Valladolid, Hospital Clínico Universitario de Valladolid, Valladolid, Spain
| | | | - Ana Beatriz Gago-Veiga
- Headache Unit, Neurology Department, Hospital Universitario de la Princesa & Instituto de Investigación Sanitaria La Princesa, Madrid, Spain
| |
Collapse
|
15
|
Cagirici HB, Budak H, Sen TZ. G4Boost: a machine learning-based tool for quadruplex identification and stability prediction. BMC Bioinformatics 2022; 23:240. [PMID: 35717172 PMCID: PMC9206279 DOI: 10.1186/s12859-022-04782-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/09/2022] [Indexed: 11/10/2022] Open
Abstract
Background G-quadruplexes (G4s), formed within guanine-rich nucleic acids, are secondary structures involved in important biological processes. Although every G4 motif has the potential to form a stable G4 structure, not every G4 motif would, and accurate energy-based methods are needed to assess their structural stability. Here, we present a decision tree-based prediction tool, G4Boost, to identify G4 motifs and predict their secondary structure folding probability and thermodynamic stability based on their sequences, nucleotide compositions, and estimated structural topologies.
Results G4Boost predicted the quadruplex folding state with an accuracy greater then 93% and an F1-score of 0.96, and the folding energy with an RMSE of 4.28 and R2 of 0.95 only by the means of sequence intrinsic feature. G4Boost was successfully applied and validated to predict the stability of experimentally-determined G4 structures, including for plants and humans. Conclusion G4Boost outperformed the three machine-learning based prediction tools, DeepG4, Quadron, and G4RNA Screener, in terms of both accuracy and F1-score, and can be highly useful for G4 prediction to understand gene regulation across species including plants and humans. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04782-z.
Collapse
Affiliation(s)
- H Busra Cagirici
- US Department of Agriculture - Agricultural Research Service, Crop Improvement Genetics Research Unit, Western Regional Research Center, 800 Buchanan St, Albany, CA, 94710, USA
| | | | - Taner Z Sen
- US Department of Agriculture - Agricultural Research Service, Crop Improvement Genetics Research Unit, Western Regional Research Center, 800 Buchanan St, Albany, CA, 94710, USA.
| |
Collapse
|
16
|
Bender A, Schneider N, Segler M, Patrick Walters W, Engkvist O, Rodrigues T. Evaluation guidelines for machine learning tools in the chemical sciences. Nat Rev Chem 2022; 6:428-442. [PMID: 37117429 DOI: 10.1038/s41570-022-00391-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2022] [Indexed: 02/07/2023]
Abstract
Machine learning (ML) promises to tackle the grand challenges in chemistry and speed up the generation, improvement and/or ordering of research hypotheses. Despite the overarching applicability of ML workflows, one usually finds diverse evaluation study designs. The current heterogeneity in evaluation techniques and metrics leads to difficulty in (or the impossibility of) comparing and assessing the relevance of new algorithms. Ultimately, this may delay the digitalization of chemistry at scale and confuse method developers, experimentalists, reviewers and journal editors. In this Perspective, we critically discuss a set of method development and evaluation guidelines for different types of ML-based publications, emphasizing supervised learning. We provide a diverse collection of examples from various authors and disciplines in chemistry. While taking into account varying accessibility across research groups, our recommendations focus on reporting completeness and standardizing comparisons between tools. We aim to further contribute to improved ML transparency and credibility by suggesting a checklist of retro-/prospective tests and dissecting their importance. We envisage that the wide adoption and continuous update of best practices will encourage an informed use of ML on real-world problems related to the chemical sciences.
Collapse
|
17
|
Khan A, Garner R, Rocca ML, Salehi S, Duncan D. A Novel Threshold-Based Segmentation Method for Quantification of COVID-19 Lung Abnormalities. SIGNAL, IMAGE AND VIDEO PROCESSING 2022; 17:907-914. [PMID: 35371333 PMCID: PMC8958480 DOI: 10.1007/s11760-022-02183-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 11/23/2021] [Accepted: 02/17/2022] [Indexed: 06/14/2023]
Abstract
Since December 2019, the novel coronavirus disease 2019 (COVID-19) has claimed the lives of more than 3.75 million people worldwide. Consequently, methods for accurate COVID-19 diagnosis and classification are necessary to facilitate rapid patient care and terminate viral spread. Lung infection segmentations are useful to identify unique infection patterns that may support rapid diagnosis, severity assessment, and patient prognosis prediction, but manual segmentations are time-consuming and depend on radiologic expertise. Deep learning-based methods have been explored to reduce the burdens of segmentation; however, their accuracies are limited due to the lack of large, publicly available annotated datasets that are required to establish ground truths. For these reasons, we propose a semi-automatic, threshold-based segmentation method to generate region of interest (ROI) segmentations of infection visible on lung computed tomography (CT) scans. Infection masks are then used to calculate the percentage of lung abnormality (PLA) to determine COVID-19 severity and to analyze the disease progression in follow-up CTs. Compared with other COVID-19 ROI segmentation methods, on average, the proposed method achieved improved precision ( 47.49 % ) and specificity ( 98.40 % ) scores. Furthermore, the proposed method generated PLAs with a difference of ± 3.89 % from the ground-truth PLAs. The improved ROI segmentation results suggest that the proposed method has potential to assist radiologists in assessing infection severity and analyzing disease progression in follow-up CTs.
Collapse
Affiliation(s)
- Azrin Khan
- Laboratory of Neuro Imaging, Keck School of Medicine of USC, USC Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA USA
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA USA
| | - Rachael Garner
- Laboratory of Neuro Imaging, Keck School of Medicine of USC, USC Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA USA
| | - Marianna La Rocca
- Laboratory of Neuro Imaging, Keck School of Medicine of USC, USC Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA USA
- Dipartimento Interateneo di Fisica, Universitá degli Studi di Bari Aldo Moro, Bari, Italy
| | - Sana Salehi
- Laboratory of Neuro Imaging, Keck School of Medicine of USC, USC Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA USA
| | - Dominique Duncan
- Laboratory of Neuro Imaging, Keck School of Medicine of USC, USC Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA USA
| |
Collapse
|
18
|
Canbek G, Taskaya Temizel T, Sagiroglu S. BenchMetrics: a systematic benchmarking method for binary classification performance metrics. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06103-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
19
|
Jain S, Talley DC, Baljinnyam B, Choe J, Hanson Q, Zhu W, Xu M, Chen CZ, Zheng W, Hu X, Shen M, Rai G, Hall MD, Simeonov A, Zakharov AV. Hybrid In Silico Approach Reveals Novel Inhibitors of Multiple SARS-CoV-2 Variants. ACS Pharmacol Transl Sci 2021; 4:1675-1688. [PMID: 34608449 PMCID: PMC8482323 DOI: 10.1021/acsptsci.1c00176] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Indexed: 11/30/2022]
Abstract
The National Center for Advancing Translational Sciences (NCATS) has been actively generating SARS-CoV-2 high-throughput screening data and disseminates it through the OpenData Portal (https://opendata.ncats.nih.gov/covid19/). Here, we provide a hybrid approach that utilizes NCATS screening data from the SARS-CoV-2 cytopathic effect reduction assay to build predictive models, using both machine learning and pharmacophore-based modeling. Optimized models were used to perform two iterative rounds of virtual screening to predict small molecules active against SARS-CoV-2. Experimental testing with live virus provided 100 (∼16% of predicted hits) active compounds (efficacy > 30%, IC50 ≤ 15 μM). Systematic clustering analysis of active compounds revealed three promising chemotypes which have not been previously identified as inhibitors of SARS-CoV-2 infection. Further investigation resulted in the identification of allosteric binders to host receptor angiotensin-converting enzyme 2; these compounds were then shown to inhibit the entry of pseudoparticles bearing spike protein of wild-type SARS-CoV-2, as well as South African B.1.351 and UK B.1.1.7 variants.
Collapse
Affiliation(s)
- Sankalp Jain
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Daniel C. Talley
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Bolormaa Baljinnyam
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Jun Choe
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Quinlin Hanson
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Wei Zhu
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Miao Xu
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Catherine Z. Chen
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Wei Zheng
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Xin Hu
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Min Shen
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Ganesha Rai
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Matthew D. Hall
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Anton Simeonov
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexey V. Zakharov
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
20
|
Aleksić S, Seeliger D, Brown JB. ADMET Predictability at Boehringer Ingelheim: State-of-the-Art, and Do Bigger Datasets or Algorithms Make a Difference? Mol Inform 2021; 41:e2100113. [PMID: 34473408 DOI: 10.1002/minf.202100113] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 08/21/2021] [Indexed: 11/08/2022]
Abstract
Computational methods assisting drug discovery and development are routine in the pharmaceutical industry. Digital recording of ADMET assays has provided a rich source of data for development of predictive models. Despite the accumulation of data and the public availability of advanced modeling algorithms, the utility of prediction in ADMET research is not clear. Here, we present a critical evaluation of the relationships between data volume, modeling algorithm, chemical representation and grouping, and temporal aspect (time sequence of assays) using an in-house ADMET database. We find no large difference in prediction algorithms nor any systemic and substantial gain from increasingly large datasets. Temporal-based data enlargement led to performance improvement in only in a limited number of assays, and with fractional improvement at best. Assays that are well-, intermediately-, or poorly-suited for ADMET predictions and reasons for such behavior are systematically identified, generating realistic expectations for areas in which computational models can be used to guide decision making in molecular design and development.
Collapse
Affiliation(s)
- Stevan Aleksić
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, 88397, Biberach, Germany
| | - Daniel Seeliger
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, 88397, Biberach, Germany
| | - J B Brown
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, 88397, Biberach, Germany
| |
Collapse
|
21
|
Sowa P, Kiszkiel Ł, Laskowski PP, Alimowski M, Szczerbiński Ł, Paniczko M, Moniuszko-Malinowska A, Kamiński K. COVID-19 Vaccine Hesitancy in Poland-Multifactorial Impact Trajectories. Vaccines (Basel) 2021; 9:876. [PMID: 34452001 PMCID: PMC8402463 DOI: 10.3390/vaccines9080876] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 08/01/2021] [Accepted: 08/03/2021] [Indexed: 11/16/2022] Open
Abstract
Since the declaration of the SARS-CoV-2 pandemic confirmed by World Health Organization, work on the development of vaccines has been stimulated. When vaccines are commonly available, a major problem is persistent vaccine hesitancy in many European countries. The main goal of our study was to understand the multidimensional factors inducing this phenomenon in Poland. Our study was carried out at the third wave's peak of the pandemic, with record rates of daily cases and deaths associated with COVID-19. The results indicate that vaccine hesitancy/acceptability should always be considered in an interdisciplinary manner and according to identified factors where most negative attitudes could be altered. Our analyses included the assessment of a representative quota sample of adult Poles (N = 1000). The vaccine hesitancy in the studied group reached 49.2%. We performed stepwise logistic regression modeling to analyze variables set into six trajectories (groups) predicting the willingness to vaccinate. Apart from typical, socio-demographic and economic determinants, we identified the fear of vaccines' side effects, beliefs in conspiracy theories and physical fitness. We were also able to establish the order of importance of factors used in a full model of all impact trajectories.
Collapse
Affiliation(s)
- Paweł Sowa
- Department of Population Medicine and Lifestyle Diseases Prevention, Medical University of Bialystok, Waszyngtona 13A, 15-089 Białystok, Poland; (P.S.); (M.P.)
| | - Łukasz Kiszkiel
- Society and Cognition Unit, University of Bialystok, 15-403 Bialystok, Poland; (Ł.K.); (P.P.L.)
| | - Piotr Paweł Laskowski
- Society and Cognition Unit, University of Bialystok, 15-403 Bialystok, Poland; (Ł.K.); (P.P.L.)
| | - Maciej Alimowski
- Doctoral School of Social Sciences, University of Bialystok, 15-403 Bialystok, Poland;
| | - Łukasz Szczerbiński
- Department of Endocrinology, Diabetology and Internal Medicine, Medical University of Bialystok, 15-276 Bialystok, Poland;
- Clinical Research Centre, Medical University of Bialystok, 15-276 Białystok, Poland
| | - Marlena Paniczko
- Department of Population Medicine and Lifestyle Diseases Prevention, Medical University of Bialystok, Waszyngtona 13A, 15-089 Białystok, Poland; (P.S.); (M.P.)
| | - Anna Moniuszko-Malinowska
- Department of Infectious Diseases and Neuroinfections, Medical University of Bialystok, 15-089 Białystok, Poland;
| | - Karol Kamiński
- Department of Population Medicine and Lifestyle Diseases Prevention, Medical University of Bialystok, Waszyngtona 13A, 15-089 Białystok, Poland; (P.S.); (M.P.)
- Department of Cardiology, University Hospital of Bialystok, 15-276 Białystok, Poland
| |
Collapse
|
22
|
Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer. Pharmaceuticals (Basel) 2021; 14:ph14070699. [PMID: 34358125 PMCID: PMC8308948 DOI: 10.3390/ph14070699] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 07/14/2021] [Accepted: 07/14/2021] [Indexed: 12/22/2022] Open
Abstract
Disruption of epigenetic processes to eradicate tumor cells is among the most promising interventions for cancer control. EZH2 (Enhancer of zeste homolog 2), a catalytic component of polycomb repressive complex 2 (PRC2), methylates lysine 27 of histone H3 to promote transcriptional silencing and is an important drug target for controlling cancer via epigenetic processes. In the present study, we have developed various predictive models for modeling the inhibitory activity of EZH2. Binary and multiclass models were built using SVM, random forest and XGBoost methods. Rigorous validation approaches including predictiveness curve, Y-randomization and applicability domain (AD) were employed for evaluation of the developed models. Eighteen descriptors selected from Boruta methods have been used for modeling. For binary classification, random forest and XGBoost achieved an accuracy of 0.80 and 0.82, respectively, on external test set. Contrastingly, for multiclass models, random forest and XGBoost achieved an accuracy of 0.73 and 0.75, respectively. 500 Y-randomization runs demonstrate that the models were robust and the correlations were not by chance. Evaluation metrics from predictiveness curve show that the selected eighteen descriptors predict active compounds with total gain (TG) of 0.79 and 0.59 for XGBoost and random forest, respectively. Validated models were further used for virtual screening and molecular docking in search of potential hits. A total of 221 compounds were commonly predicted as active with above the set probability threshold and also under the AD of training set. Molecular docking revealed that three compounds have reasonable binding energy and favorable interactions with critical residues in the active site of EZH2. In conclusion, we highlighted the potential of rigorously validated models for accurately predicting and ranking the activities of lead molecules against cancer epigenetic targets. The models presented in this study represent the platform for development of EZH2 inhibitors.
Collapse
|
23
|
Jiménez-Luna J, Grisoni F, Weskamp N, Schneider G. Artificial intelligence in drug discovery: recent advances and future perspectives. Expert Opin Drug Discov 2021; 16:949-959. [PMID: 33779453 DOI: 10.1080/17460441.2021.1909567] [Citation(s) in RCA: 97] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Introduction: Artificial intelligence (AI) has inspired computer-aided drug discovery. The widespread adoption of machine learning, in particular deep learning, in multiple scientific disciplines, and the advances in computing hardware and software, among other factors, continue to fuel this development. Much of the initial skepticism regarding applications of AI in pharmaceutical discovery has started to vanish, consequently benefitting medicinal chemistry.Areas covered: The current status of AI in chemoinformatics is reviewed. The topics discussed herein include quantitative structure-activity/property relationship and structure-based modeling, de novo molecular design, and chemical synthesis prediction. Advantages and limitations of current deep learning applications are highlighted, together with a perspective on next-generation AI for drug discovery.Expert opinion: Deep learning-based approaches have only begun to address some fundamental problems in drug discovery. Certain methodological advances, such as message-passing models, spatial-symmetry-preserving networks, hybrid de novo design, and other innovative machine learning paradigms, will likely become commonplace and help address some of the most challenging questions. Open data sharing and model development will play a central role in the advancement of drug discovery with AI.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Francesca Grisoni
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Nils Weskamp
- Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an Der Riss, Germany
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
24
|
Brown J. Practical Chemogenomic Modeling and Molecule Discovery Strategies Unveiled by Active Learning. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11533-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
25
|
Przybyłek M. Application 2D Descriptors and Artificial Neural Networks for Beta-Glucosidase Inhibitors Screening. Molecules 2020; 25:E5942. [PMID: 33333961 PMCID: PMC7765417 DOI: 10.3390/molecules25245942] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 12/12/2020] [Accepted: 12/14/2020] [Indexed: 12/14/2022] Open
Abstract
Beta-glucosidase inhibitors play important medical and biological roles. In this study, simple two-variable artificial neural network (ANN) classification models were developed for beta-glucosidase inhibitors screening. All bioassay data were obtained from the ChEMBL database. The classifiers were generated using 2D molecular descriptors and the data miner tool available in the STATISTICA package (STATISTICA Automated Neural Networks, SANN). In order to evaluate the models' accuracy and select the best classifiers among automatically generated SANNs, the Matthews correlation coefficient (MCC) was used. The application of the combination of maxHBint3 and SpMax8_Bhs descriptors leads to the highest predicting abilities of SANNs, as evidenced by the averaged test set prediction results (MCC = 0.748) calculated for ten different dataset splits. Additionally, the models were analyzed employing receiver operating characteristics (ROC) and cumulative gain charts. The thirteen final classifiers obtained as a result of the model development procedure were applied for a natural compounds collection available in the BIOFACQUIM database. As a result of this beta-glucosidase inhibitors screening, eight compounds were univocally classified as active by all SANNs.
Collapse
Affiliation(s)
- Maciej Przybyłek
- Department of Physical Chemistry, Pharmacy Faculty, Collegium Medicum of Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-950 Bydgoszcz, Poland
| |
Collapse
|
26
|
Nakano T, Takeda S, Brown JB. Active learning effectively identifies a minimal set of maximally informative and asymptotically performant cytotoxic structure-activity patterns in NCI-60 cell lines. RSC Med Chem 2020; 11:1075-1087. [PMID: 33479700 PMCID: PMC7513593 DOI: 10.1039/d0md00110d] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 06/30/2020] [Indexed: 11/21/2022] Open
Abstract
The NCI-60 cancer cell line screening panel has provided insights for development of subtype-specific chemical therapies and repurposing. By extracting chemical structure and cytotoxicity patterns, virtual screening potentially complements the availability of high-throughput assay platforms and improves bioactive compound discovery rates by computational prefiltering of candidate compound libraries. Many groups report high prediction performances in computational models of NCI-60 data when using cross-validation or similar techniques, yet prospective therapy development in novel cancers may have little to no such data and further may not have the resources to perform hit identification using large compound libraries. In contrast to bulk screening and analysis, the active learning methodology has demonstrated how to identify compounds for screening in small batches and update computational models iteratively, leading to predictive models with a minimum number of compounds, and importantly clarifying data volumes at which limits in predictive ability are achieved. Here, in replicate per-cell line experiments using 50% of data (∼20 000 compounds) as the external prediction target, predictive limits are reproducibly demonstrated at the stage of systematic selection of 10-30% of the incorporable half. The pattern was consistent across all 60 cell lines. Limits of predictability are found to be correlated to the doubling times of cell lines and the number of cellular response discontinuities (activity cliffs) present per cell line. Organization into chemical scaffolds delineated degrees of predictive challenge. These results provide key insights for strategies in developing new inhibitors in existing cell lines or for future automated therapy selection in personalized oncotherapy.
Collapse
Affiliation(s)
- Takumi Nakano
- Kyoto University Graduate School of Medicine , Department of Molecular Biosciences , Life Science Informatics Research Unit , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan .
| | - Shunichi Takeda
- Kyoto University Graduate School of Medicine , Department of Radiation Genetics , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan
| | - J B Brown
- Kyoto University Graduate School of Medicine , Department of Molecular Biosciences , Life Science Informatics Research Unit , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan .
| |
Collapse
|
27
|
Liu C, Zhao R, Xie W, Pang M. Pathological lung segmentation based on random forest combined with deep model and multi-scale superpixels. Neural Process Lett 2020; 52:1631-1649. [PMID: 32837245 PMCID: PMC7413019 DOI: 10.1007/s11063-020-10330-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Accurate segmentation of lungs in pathological thoracic computed tomography (CT) scans plays an important role in pulmonary disease diagnosis. However, it is still a challenging task due to the variability of pathological lung appearances and shapes. In this paper, we proposed a novel segmentation algorithm based on random forest (RF), deep convolutional network, and multi-scale superpixels for segmenting pathological lungs from thoracic CT images accurately. A pathological thoracic CT image is first segmented based on multi-scale superpixels, and deep features, texture, and intensity features extracted from superpixels are taken as inputs of a group of RF classifiers. With the fusion of classification results of RFs by a fractional-order gray correlation approach, we capture an initial segmentation of pathological lungs. We finally utilize a divide-and-conquer strategy to deal with segmentation refinement combining contour correction of left lungs and region repairing of right lungs. Our algorithm is tested on a group of thoracic CT images affected with interstitial lung diseases. Experiments show that our algorithm can achieve a high segmentation accuracy with an average DSC of 96.45% and PPV of 95.07%. Compared with several existing lung segmentation methods, our algorithm exhibits a robust performance on pathological lung segmentation. Our algorithm can be employed reliably for lung field segmentation of pathologic thoracic CT images with a high accuracy, which is helpful to assist radiologists to detect the presence of pulmonary diseases and quantify its shape and size in regular clinical practices.
Collapse
Affiliation(s)
- Caixia Liu
- Institute of EduInfo Science and Engineering, Nanjing Normal University, Nanjing, China
| | - Ruibin Zhao
- Institute of EduInfo Science and Engineering, Nanjing Normal University, Nanjing, China
| | - Wangli Xie
- Institute of EduInfo Science and Engineering, Nanjing Normal University, Nanjing, China
| | - Mingyong Pang
- Institute of EduInfo Science and Engineering, Nanjing Normal University, Nanjing, China
| |
Collapse
|
28
|
Active learning efficiently converges on rational limits of toxicity prediction and identifies patterns for molecule design. ACTA ACUST UNITED AC 2020. [DOI: 10.1016/j.comtox.2020.100129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
29
|
Sosnina EA, Sosnin S, Nikitina AA, Nazarov I, Osolodkin DI, Fedorov MV. Recommender Systems in Antiviral Drug Discovery. ACS OMEGA 2020; 5:15039-15051. [PMID: 32632398 PMCID: PMC7315437 DOI: 10.1021/acsomega.0c00857] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 06/03/2020] [Indexed: 06/11/2023]
Abstract
Recommender systems (RSs), which underwent rapid development and had an enormous impact on e-commerce, have the potential to become useful tools for drug discovery. In this paper, we applied RS methods for the prediction of the antiviral activity class (active/inactive) for compounds extracted from ChEMBL. Two main RS approaches were applied: collaborative filtering (Surprise implementation) and content-based filtering (sparse-group inductive matrix completion (SGIMC) method). The effectiveness of RS approaches was investigated for prediction of antiviral activity classes ("interactions") for compounds and viruses, for which some of their interactions with other viruses or compounds are known, and for prediction of interaction profiles for new compounds. Both approaches achieved relatively good prediction quality for binary classification of individual interactions and compound profiles, as quantified by cross-validation and external validation receiver operating characteristic (ROC) score >0.9. Thus, even simple recommender systems may serve as an effective tool in antiviral drug discovery.
Collapse
Affiliation(s)
- Ekaterina A. Sosnina
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Institute
of Physiologically Active Compounds, RAS, Severniy pr. 1, Chernogolovka 142432, Russia
| | - Sergey Sosnin
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Syntelly
LLC, Skolkovo Innovation Center, Bolshoy Boulevard 30, Moscow 121205, Russia
| | - Anastasia A. Nikitina
- Department
of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1 bd. 3, Moscow 119991, Russia
- FSBSI
“Chumakov FSC R&D IBP RAS”, Poselok Instituta Poliomielita 8
bd. 1, Poselenie Moskovsky, Moscow 108819, Russia
| | - Ivan Nazarov
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
| | - Dmitry I. Osolodkin
- FSBSI
“Chumakov FSC R&D IBP RAS”, Poselok Instituta Poliomielita 8
bd. 1, Poselenie Moskovsky, Moscow 108819, Russia
- Institute
of Translational Medicine and Biotechnology, Sechenov First Moscow State Medical University, Trubetskaya Ulitsa 8, Moscow 119991, Russia
| | - Maxim V. Fedorov
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Syntelly
LLC, Skolkovo Innovation Center, Bolshoy Boulevard 30, Moscow 121205, Russia
- Physics
John Anderson Building, University of Strathclyde, 107 Rottenrow East, Glasgow G4 0NG, U.K.
| |
Collapse
|
30
|
Chishti S, Jaggi KR, Saini A, Agarwal G, Ranjan A. Artificial Intelligence-Based Differential Diagnosis: Development and Validation of a Probabilistic Model to Address Lack of Large-Scale Clinical Datasets. J Med Internet Res 2020; 22:e17550. [PMID: 32343256 PMCID: PMC7218591 DOI: 10.2196/17550] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 01/30/2020] [Accepted: 02/01/2020] [Indexed: 12/19/2022] Open
Abstract
Background Machine-learning or deep-learning algorithms for clinical diagnosis are inherently dependent on the availability of large-scale clinical datasets. Lack of such datasets and inherent problems such as overfitting often necessitate the development of innovative solutions. Probabilistic modeling closely mimics the rationale behind clinical diagnosis and represents a unique solution. Objective The aim of this study was to develop and validate a probabilistic model for differential diagnosis in different medical domains. Methods Numerical values of symptom-disease associations were utilized to mathematically represent medical domain knowledge. These values served as the core engine for the probabilistic model. For the given set of symptoms, the model was utilized to produce a ranked list of differential diagnoses, which was compared to the differential diagnosis constructed by a physician in a consult. Practicing medical specialists were integral in the development and validation of this model. Clinical vignettes (patient case studies) were utilized to compare the accuracy of doctors and the model against the assumed gold standard. The accuracy analysis was carried out over the following metrics: top 3 accuracy, precision, and recall. Results The model demonstrated a statistically significant improvement (P=.002) in diagnostic accuracy (85%) as compared to the doctors’ performance (67%). This advantage was retained across all three categories of clinical vignettes: 100% vs 82% (P<.001) for highly specific disease presentation, 83% vs 65% for moderately specific disease presentation (P=.005), and 72% vs 49% (P<.001) for nonspecific disease presentation. The model performed slightly better than the doctors’ average in precision (62% vs 60%, P=.43) but there was no improvement with respect to recall (53% vs 56%, P=.27). However, neither difference was statistically significant. Conclusions The present study demonstrates a drastic improvement over previously reported results that can be attributed to the development of a stable probabilistic framework utilizing symptom-disease associations to mathematically represent medical domain knowledge. The current iteration relies on static, manually curated values for calculating the degree of association. Shifting to real-world data–derived values represents the next step in model development.
Collapse
Affiliation(s)
| | | | - Anuj Saini
- 1mg Technologies Pvt Ltd, Gurgaon, India
| | | | | |
Collapse
|
31
|
Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 2020; 21:6. [PMID: 31898477 PMCID: PMC6941312 DOI: 10.1186/s12864-019-6413-7] [Citation(s) in RCA: 1205] [Impact Index Per Article: 301.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 12/18/2019] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. RESULTS The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. CONCLUSIONS In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.
Collapse
Affiliation(s)
- Davide Chicco
- Krembil Research Institute, Toronto, Ontario, Canada
- Peter Munk Cardiac Centre, Toronto, Ontario, Canada
| | | |
Collapse
|
32
|
Prediction of Compound Cytotoxicity Based on Compound Structures and Cell Line Molecular Characteristics. JOURNAL OF COMPUTER AIDED CHEMISTRY 2020. [DOI: 10.2751/jcac.21.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
33
|
Applicability Domain of Active Learning in Chemical Probe Identification: Convergence in Learning from Non-Specific Compounds and Decision Rule Clarification. Molecules 2019; 24:molecules24152716. [PMID: 31357419 PMCID: PMC6696588 DOI: 10.3390/molecules24152716] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 07/19/2019] [Accepted: 07/24/2019] [Indexed: 12/27/2022] Open
Abstract
Efficient identification of chemical probes for the manipulation and understanding of biological systems demands specificity for target proteins. Computational means to optimize candidate compound selection for experimental selectivity evaluation are being sought. The active learning virtual screening method has demonstrated the ability to efficiently converge on predictive models with reduced datasets, though its applicability domain to probe identification has yet to be determined. In this article, we challenge active learning’s ability to predict inhibitory bioactivity profiles of selective compounds when learning from chemogenomic features found in non-selective ligand-target pairs. Comparison of controls versus multiple molecule representations de-convolutes factors contributing to predictive capability. Experiments using the matrix metalloproteinase family demonstrate maximum probe bioactivity prediction achieved from only approximately 20% of non-probe bioactivity; this data volume is consistent with prior chemogenomic active learning studies despite the increased difficulty from chemical biology experimental settings used here. Feature weight analyses are combined with a custom visualization to unambiguously detail how active learning arrives at classification decisions, yielding clarified expectations for chemogenomic modeling. The results influence tactical decisions for computational probe design and discovery.
Collapse
|
34
|
Grisoni F, Neuhaus CS, Hishinuma M, Gabernet G, Hiss JA, Kotera M, Schneider G. De novo design of anticancer peptides by ensemble artificial neural networks. J Mol Model 2019; 25:112. [PMID: 30953170 DOI: 10.1007/s00894-019-4007-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 03/21/2019] [Indexed: 12/17/2022]
Abstract
Membranolytic anticancer peptides (ACPs) are drawing increasing attention as potential future therapeutics against cancer, due to their ability to hinder the development of cellular resistance and their potential to overcome common hurdles of chemotherapy, e.g., side effects and cytotoxicity. In this work, we present an ensemble machine learning model to design potent ACPs. Four counter-propagation artificial neural-networks were trained to identify peptides that kill breast and/or lung cancer cells. For prospective application of the ensemble model, we selected 14 peptides from a total of 1000 de novo designs, for synthesis and testing in vitro on breast cancer (MCF7) and lung cancer (A549) cell lines. Six de novo designs showed anticancer activity in vitro, five of which against both MCF7 and A549 cell lines. The novel active peptides populate uncharted regions of ACP sequence space.
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland. .,Department of Earth and Environmental Sciences, University of Milano-Bicocca, Piazza della Scienza 1, 20126, Milan, Italy.
| | - Claudia S Neuhaus
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Miyabi Hishinuma
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.,Department of Chemical System Engineering, School of Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan.,School of Life Science and Technology, Tokyo Institute of Technology, 1-11-5, Midorigaoka, Meguro-ku, Tokyo, 152-0034, Japan
| | - Gisela Gabernet
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Jan A Hiss
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Masaaki Kotera
- Department of Chemical System Engineering, School of Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
| |
Collapse
|
35
|
Cayley A, Fowkes A, Williams RV. Important considerations for the validation of QSAR models for in vitro mutagenicity. Mutagenesis 2019; 34:25-32. [PMID: 30346596 DOI: 10.1093/mutage/gey034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/04/2018] [Accepted: 10/04/2018] [Indexed: 11/12/2022] Open
Abstract
While high-level performance metrics generated from the validation of quantitative structure-activity relationship (QSAR) systems can provide valuable information on how well these models perform and where they need to be improved, they require appropriate interpretation. There is no universal performance metric which will answer all of the questions a user might ask relating to a model, and therefore, a combination of metrics should usually be considered. Furthermore, results may vary according to the chemical space being used to validate a model, and, in some cases, it may be the validation data which is lacking or ambiguous rather than the prediction being made. Finally, users also need to consider the interpretability of the predictions being made, alongside the accuracy of the predictions. In this paper, we will discuss these important considerations in more detail within the context of the results obtained at Lhasa Limited as part of the National Institute of Health Sciences (NIHS) QSAR challenge project.
Collapse
Affiliation(s)
- Alex Cayley
- Lhasa Limited, Granary Wharf House, Leeds, UK
| | | | | |
Collapse
|
36
|
Adaptive mining and model building of medicinal chemistry data with a multi-metric perspective. Future Med Chem 2018; 10:1885-1887. [PMID: 29966447 DOI: 10.4155/fmc-2018-0188] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
37
|
Berishvili VP, Voronkov AE, Radchenko EV, Palyulin VA. Machine Learning Classification Models to Improve the Docking-based Screening: A Case of PI3K-Tankyrase Inhibitors. Mol Inform 2018; 37:e1800030. [DOI: 10.1002/minf.201800030] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 05/28/2018] [Indexed: 01/20/2023]
Affiliation(s)
- Vladimir P. Berishvili
- Department of Chemistry; Lomonosov Moscow State University; Leninskie gory 1/3 Moscow 119991 Russia
| | - Andrew E. Voronkov
- Department of Chemistry; Lomonosov Moscow State University; Leninskie gory 1/3 Moscow 119991 Russia
- Digital BioPharm Ltd.; Hovseterveien 42 A, H0301 Oslo 0768 Norway
| | - Eugene V. Radchenko
- Department of Chemistry; Lomonosov Moscow State University; Leninskie gory 1/3 Moscow 119991 Russia
| | - Vladimir A. Palyulin
- Department of Chemistry; Lomonosov Moscow State University; Leninskie gory 1/3 Moscow 119991 Russia
| |
Collapse
|
38
|
Rakers C, Najnin RA, Polash AH, Takeda S, Brown J. Chemogenomic Active Learning's Domain of Applicability on Small, Sparse qHTS Matrices: A Study Using Cytochrome P450 and Nuclear Hormone Receptor Families. ChemMedChem 2018; 13:511-521. [DOI: 10.1002/cmdc.201700677] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/04/2017] [Indexed: 01/21/2023]
Affiliation(s)
- Christin Rakers
- Institute of Transformative bio-Molecules, WPI-ITbM; Nagoya University; Furo-cho Chikusa-ku Nagoya 464-8602 Japan
| | - Rifat Ara Najnin
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - Ahsan Habib Polash
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - Shunichi Takeda
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - J.B. Brown
- Laboratory for Molecular Biosciences; Kyoto University Graduate School of Medicine; Yoshida-konoemachi Building E 606-8501 Kyoto Sakyo Japan
| |
Collapse
|