51
|
Zimmer L, Lindauer M, Hutter F. Auto-Pytorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:3079-3090. [PMID: 33750687 DOI: 10.1109/tpami.2021.3067763] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, a recent trend in AutoML is to focus on neural architecture search. In this paper, we introduce Auto-PyTorch, which brings together the best of these two worlds by jointly and robustly optimizing the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL). Auto-PyTorch achieves state-of-the-art performance on several tabular benchmarks by combining multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs) and common baselines for tabular data. To thoroughly study our assumptions on how to design such an AutoDL system, we additionally introduce a new benchmark on learning curves for DNNs, dubbed LCBench, and run extensive ablation studies of the full Auto-PyTorch on typical AutoML benchmarks, eventually showing that Auto-PyTorch performs better than several state-of-the-art competitors.
Collapse
|
52
|
Hatmal MM, Abuyaman O, Taha M. Docking-generated multiple ligand poses for bootstrapping bioactivity classifying Machine Learning: Repurposing covalent inhibitors for COVID-19-related TMPRSS2 as case study. Comput Struct Biotechnol J 2021; 19:4790-4824. [PMID: 34426763 PMCID: PMC8373588 DOI: 10.1016/j.csbj.2021.08.023] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 08/03/2021] [Accepted: 08/16/2021] [Indexed: 01/10/2023] Open
Abstract
In the present work we introduce the use of multiple docked poses for bootstrapping machine learning-based QSAR modelling. Ligand-receptor contact fingerprints are implemented as descriptor variables. We implemented this method for the discovery of potential inhibitors of the serine protease enzyme TMPRSS2 involved the infectivity of coronaviruses. Several machine learners were scanned, however, Xgboost, support vector machines (SVM) and random forests (RF) were the best with testing set accuracies reaching 90%. Three potential hits were identified upon using the method to scan known untested FDA approved drugs against TMPRSS2. Subsequent molecular dynamics simulation and covalent docking supported the results of the new computational approach.
Collapse
Affiliation(s)
- Ma'mon M. Hatmal
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, PO Box 330127, Zarqa 13133, Jordan
| | - Omar Abuyaman
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, PO Box 330127, Zarqa 13133, Jordan
| | - Mutasem Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman 11942, Jordan
| |
Collapse
|
53
|
Artificial intelligence for the next generation of precision oncology. NPJ Precis Oncol 2021; 5:79. [PMID: 34408248 PMCID: PMC8373978 DOI: 10.1038/s41698-021-00216-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 07/21/2021] [Indexed: 12/14/2022] Open
|
54
|
Papoutsoglou G, Karaglani M, Lagani V, Thomson N, Røe OD, Tsamardinos I, Chatzaki E. Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets. Sci Rep 2021; 11:15107. [PMID: 34302024 PMCID: PMC8302755 DOI: 10.1038/s41598-021-94501-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 07/08/2021] [Indexed: 12/24/2022] Open
Abstract
COVID-19 outbreak brings intense pressure on healthcare systems, with an urgent demand for effective diagnostic, prognostic and therapeutic procedures. Here, we employed Automated Machine Learning (AutoML) to analyze three publicly available high throughput COVID-19 datasets, including proteomic, metabolomic and transcriptomic measurements. Pathway analysis of the selected features was also performed. Analysis of a combined proteomic and metabolomic dataset led to 10 equivalent signatures of two features each, with AUC 0.840 (CI 0.723-0.941) in discriminating severe from non-severe COVID-19 patients. A transcriptomic dataset led to two equivalent signatures of eight features each, with AUC 0.914 (CI 0.865-0.955) in identifying COVID-19 patients from those with a different acute respiratory illness. Another transcriptomic dataset led to two equivalent signatures of nine features each, with AUC 0.967 (CI 0.899-0.996) in identifying COVID-19 patients from virus-free individuals. Signature predictive performance remained high upon validation. Multiple new features emerged and pathway analysis revealed biological relevance by implication in Viral mRNA Translation, Interferon gamma signaling and Innate Immune System pathways. In conclusion, AutoML analysis led to multiple biosignatures of high predictive performance, with reduced features and large choice of alternative predictors. These favorable characteristics are eminent for development of cost-effective assays to contribute to better disease management.
Collapse
Affiliation(s)
- Georgios Papoutsoglou
- JADBio, Gnosis Data Analysis PC, Science and Technology Park of Crete, N. Plastira 100, Vassilika Vouton, 70013, Heraklion, Crete, Greece
- Computer Science Department, University of Crete, Voutes Campus, 70013, Heraklion, Crete, Greece
| | - Makrina Karaglani
- JADBio, Gnosis Data Analysis PC, Science and Technology Park of Crete, N. Plastira 100, Vassilika Vouton, 70013, Heraklion, Crete, Greece
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, 68100, Alexandroupolis, Greece
| | - Vincenzo Lagani
- JADBio, Gnosis Data Analysis PC, Science and Technology Park of Crete, N. Plastira 100, Vassilika Vouton, 70013, Heraklion, Crete, Greece
- Institute of Chemical Biology, Ilia State University, Kakutsa Cholokashvili Ave 3/5, 0162, Tbilisi, Georgia
| | - Naomi Thomson
- JADBio, Gnosis Data Analysis PC, Science and Technology Park of Crete, N. Plastira 100, Vassilika Vouton, 70013, Heraklion, Crete, Greece
| | - Oluf Dimitri Røe
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Prinsesse Kristinsgt. 1, 7491, Trondheim, Norway
- Clinical Cancer Research Center, Department of Clinical Medicine, Aalborg University Hospital, Hobrovej 18-22, 9100, Aalborg, Denmark
| | - Ioannis Tsamardinos
- JADBio, Gnosis Data Analysis PC, Science and Technology Park of Crete, N. Plastira 100, Vassilika Vouton, 70013, Heraklion, Crete, Greece
- Computer Science Department, University of Crete, Voutes Campus, 70013, Heraklion, Crete, Greece
| | - Ekaterini Chatzaki
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, 68100, Alexandroupolis, Greece.
- Institute of Agri-Food and Life Sciences, Mediterranean University Research Centre, 71410, Heraklion, Crete, Greece.
| |
Collapse
|
55
|
Benjumeda M, Tan YL, González Otárula KA, Chandramohan D, Chang EF, Hall JA, Bielza C, Larrañaga P, Kobayashi E, Knowlton RC. Patient specific prediction of temporal lobe epilepsy surgical outcomes. Epilepsia 2021; 62:2113-2122. [PMID: 34275140 DOI: 10.1111/epi.17002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 06/26/2021] [Accepted: 06/28/2021] [Indexed: 11/28/2022]
Abstract
OBJECTIVE Drug-resistant temporal lobe epilepsy (TLE) is the most common type of epilepsy for which patients undergo surgery. Despite the best clinical judgment and currently available prediction algorithms, surgical outcomes remain variable. We aimed to build and to evaluate the performance of multidimensional Bayesian network classifiers (MBCs), a type of probabilistic graphical model, at predicting probability of seizure freedom after TLE surgery. METHODS Clinical, neurophysiological, and imaging variables were collected from 231 TLE patients who underwent surgery at the University of California, San Francisco (UCSF) or the Montreal Neurological Institute (MNI) over a 15-year period. Postsurgical Engel outcomes at year 1 (Y1), Y2, and Y5 were analyzed as primary end points. We trained an MBC model on combined data sets from both institutions. Bootstrap bias corrected cross-validation (BBC-CV) was used to evaluate the performance of the models. RESULTS The MBC was compared with logistic regression and Cox proportional hazards according to the area under the receiver-operating characteristic curve (AUC). The MBC achieved an AUC of 0.67 at Y1, 0.72 at Y2, and 0.67 at Y5, which indicates modest performance yet superior to what has been reported in the state-of-the-art studies to date. SIGNIFICANCE The MBC can more precisely encode probabilistic relationships between predictors and class variables (Engel outcomes), achieving promising experimental results compared to other well-known statistical methods. Multisite application of the MBC could further optimize its classification accuracy with prospective data sets. Online access to the MBC is provided, paving the way for its use as an adjunct clinical tool in aiding pre-operative TLE surgical counseling.
Collapse
Affiliation(s)
- Marco Benjumeda
- Computational Intelligence Group, Department of Artificial Intelligence, Universidad Politécnica de Madrid, Madrid, Spain
| | - Yee-Leng Tan
- Department of Neurology, University of California San Francisco Medical Center, San Francisco, CA, USA.,Department of Neurology and Neurosurgery, Montreal Neurological Institute and Hospital, McGill University, Montreal, QC, Canada.,Department of Neurology, National Neuroscience Institute, Singapore, Singapore
| | - Karina A González Otárula
- Department of Neurology and Neurosurgery, Montreal Neurological Institute and Hospital, McGill University, Montreal, QC, Canada
| | - Dharshan Chandramohan
- Department of Neurology, University of California San Francisco Medical Center, San Francisco, CA, USA
| | - Edward F Chang
- Department of Neurosurgery, University of California San Francisco Medical Center, San Francisco, CA, USA
| | - Jeffery A Hall
- Department of Neurology and Neurosurgery, Montreal Neurological Institute and Hospital, McGill University, Montreal, QC, Canada
| | - Concha Bielza
- Computational Intelligence Group, Department of Artificial Intelligence, Universidad Politécnica de Madrid, Madrid, Spain
| | - Pedro Larrañaga
- Computational Intelligence Group, Department of Artificial Intelligence, Universidad Politécnica de Madrid, Madrid, Spain
| | - Eliane Kobayashi
- Department of Neurology and Neurosurgery, Montreal Neurological Institute and Hospital, McGill University, Montreal, QC, Canada
| | - Robert C Knowlton
- Department of Neurology, University of California San Francisco Medical Center, San Francisco, CA, USA
| |
Collapse
|
56
|
Panagopoulou M, Cheretaki A, Karaglani M, Balgkouranidou I, Biziota E, Amarantidis K, Xenidis N, Kakolyris S, Baritaki S, Chatzaki E. Methylation Status of Corticotropin-Releasing Factor (CRF) Receptor Genes in Colorectal Cancer. J Clin Med 2021; 10:2680. [PMID: 34207031 PMCID: PMC8234503 DOI: 10.3390/jcm10122680] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/13/2021] [Accepted: 06/15/2021] [Indexed: 02/07/2023] Open
Abstract
The corticotropin-releasing factor (CRF) system has been strongly associated with gastrointestinal pathophysiology, including colorectal cancer (CRC). We previously showed that altered expression of CRF receptors (CRFRs) in the colon critically affects CRC progression and aggressiveness through regulation of colonic inflammation. Here, we aimed to assess the potential of CRFR methylation levels as putative biomarkers in CRC. In silico methylation analysis of CRF receptor 1 (CRFR1) and CRF receptor 2 (CRFR2) was performed using methylome data derived by CRC and Crohn's disease (CD) tissues and CRC-derived circulating cell-free DNAs (ccfDNAs). In total, 32 and 33 differentially methylated sites of CpGs (DMCs) emerged in CRFR1 and CRFR2, respectively, between healthy and diseased tissues. The methylation patterns were verified in patient-derived ccfDNA samples by qMSP and associated with clinicopathological characteristics. An automated machine learning (AutoML) technology was applied to ccfDNA samples for classification analysis. In silico analysis revealed increased methylation of both CRFRs in CRC tissue and ccfDNA-derived datasets. CRFR1 hypermethylation was also noticed in gene body DMCs of CD patients. CRFR1 hypermethylation was further validated in CRC adjuvant-derived ccfDNA samples, whereas CRFR1 hypomethylation, observed in metastasis-derived ccfDNAs, was correlated to disease aggressiveness and adverse prognostic characteristics. AutoML analysis based on CRFRs methylation status revealed a three-feature high-performing biosignature for CRC diagnosis with an estimated AUC of 0.929. Monitoring of CRFRs methylation-based signature in CRC tissues and ccfDNAs may be of high diagnostic and prognostic significance in CRC.
Collapse
Affiliation(s)
- Maria Panagopoulou
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (M.P.); (A.C.); (M.K.); (I.B.)
| | - Antonia Cheretaki
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (M.P.); (A.C.); (M.K.); (I.B.)
| | - Makrina Karaglani
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (M.P.); (A.C.); (M.K.); (I.B.)
| | - Ioanna Balgkouranidou
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (M.P.); (A.C.); (M.K.); (I.B.)
- Department of Medical Oncology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (E.B.); (K.A.); (N.X.); (S.K.)
| | - Eirini Biziota
- Department of Medical Oncology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (E.B.); (K.A.); (N.X.); (S.K.)
| | - Kyriakos Amarantidis
- Department of Medical Oncology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (E.B.); (K.A.); (N.X.); (S.K.)
| | - Nikolaos Xenidis
- Department of Medical Oncology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (E.B.); (K.A.); (N.X.); (S.K.)
| | - Stylianos Kakolyris
- Department of Medical Oncology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (E.B.); (K.A.); (N.X.); (S.K.)
| | - Stavroula Baritaki
- Laboratory of Experimental Oncology, Division of Surgery, School of Medicine, University of Crete, GR-71003 Heraklion, Greece
| | - Ekaterini Chatzaki
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (M.P.); (A.C.); (M.K.); (I.B.)
- Hellenic Mediterranean University Research Centre, Institute of Agri-Food and Life Sciences, GR-71410 Heraklion, Greece
| |
Collapse
|
57
|
Rounis K, Makrakis D, Papadaki C, Monastirioti A, Vamvakas L, Kalbakis K, Gourlia K, Xanthopoulos I, Tsamardinos I, Mavroudis D, Agelaki S. Prediction of outcome in patients with non-small cell lung cancer treated with second line PD-1/PDL-1 inhibitors based on clinical parameters: Results from a prospective, single institution study. PLoS One 2021; 16:e0252537. [PMID: 34061904 PMCID: PMC8168865 DOI: 10.1371/journal.pone.0252537] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 05/17/2021] [Indexed: 01/06/2023] Open
Abstract
OBJECTIVE We prospectively recorded clinical and laboratory parameters from patients with metastatic non-small cell lung cancer (NSCLC) treated with 2nd line PD-1/PD-L1 inhibitors in order to address their effect on treatment outcomes. MATERIALS AND METHODS Clinicopathological information (age, performance status, smoking, body mass index, histology, organs with metastases), use and duration of proton pump inhibitors, steroids and antibiotics (ATB) and laboratory values [neutrophil/lymphocyte ratio, LDH, albumin] were prospectively collected. Steroid administration was defined as the use of > 10 mg prednisone equivalent for ≥ 10 days. Prolonged ATB administration was defined as ATB ≥ 14 days 30 days before or within the first 3 months of treatment. JADBio, a machine learning pipeline was applied for further multivariate analysis. RESULTS Data from 66 pts with non-oncogenic driven metastatic NSCLC were analyzed; 15.2% experienced partial response (PR), 34.8% stable disease (SD) and 50% progressive disease (PD). Median overall survival (OS) was 6.77 months. ATB administration did not affect patient OS [HR = 1.35 (CI: 0.761-2.406, p = 0.304)], however, prolonged ATBs [HR = 2.95 (CI: 1.62-5.36, p = 0.0001)] and the presence of bone metastases [HR = 1.89 (CI: 1.02-3.51, p = 0.049)] independently predicted for shorter survival. Prolonged ATB administration, bone metastases, liver metastases and BMI < 25 kg/m2 were selected by JADbio as the important features that were associated with increased probability of developing disease progression as response to treatment. The resulting algorithm that was created was able to predict the probability of disease stabilization (PR or SD) in a single individual with an AUC = 0.806 [95% CI:0.714-0.889]. CONCLUSIONS Our results demonstrate an adverse effect of prolonged ATBs on response and survival and underscore their importance along with the presence of bone metastases, liver metastases and low BMI in the individual prediction of outcomes in patients treated with immunotherapy.
Collapse
Affiliation(s)
- Konstantinos Rounis
- Department of Medical Oncology, University General Hospital, Heraklion, Crete, Greece
| | - Dimitrios Makrakis
- Department of Medical Oncology, University General Hospital, Heraklion, Crete, Greece
- Division of Oncology, University of Washington Medical School, Seattle, Washington, United States of America
| | - Chara Papadaki
- Laboratory of Translational Oncology, School of Medicine, University of Crete, Heraklion, Crete, Greece
| | - Alexia Monastirioti
- Laboratory of Translational Oncology, School of Medicine, University of Crete, Heraklion, Crete, Greece
| | - Lambros Vamvakas
- Department of Medical Oncology, University General Hospital, Heraklion, Crete, Greece
| | - Konstantinos Kalbakis
- Department of Medical Oncology, University General Hospital, Heraklion, Crete, Greece
| | - Krystallia Gourlia
- Department of Computer Science, University of Crete, Heraklion, Crete, Greece
| | | | - Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Heraklion, Crete, Greece
| | - Dimitrios Mavroudis
- Department of Medical Oncology, University General Hospital, Heraklion, Crete, Greece
- Laboratory of Translational Oncology, School of Medicine, University of Crete, Heraklion, Crete, Greece
| | - Sofia Agelaki
- Department of Medical Oncology, University General Hospital, Heraklion, Crete, Greece
- Laboratory of Translational Oncology, School of Medicine, University of Crete, Heraklion, Crete, Greece
| |
Collapse
|
58
|
Nissen LR, Tsamardinos I, Eskelund K, Gradus JL, Andersen SB, Karstoft KI. Forecasting military mental health in a complete sample of Danish military personnel deployed between 1992-2013. J Affect Disord 2021; 288:167-174. [PMID: 33901697 DOI: 10.1016/j.jad.2021.04.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 04/04/2021] [Accepted: 04/07/2021] [Indexed: 11/18/2022]
Abstract
OBJECTIVE Mental health problems (MHP) are a relatively common consequence of deployment to war zones. Early identification of those at risk of post-deployment MHP would improve prevention efforts. However, screening instruments based on linear models have not been successful. Machine learning (ML) has shown promise for providing the methodological frame for better prognostic models. METHODS The study population was all Danish military personnel deployed for the first time between January 1, 1992 and December 31, 2013. From extensive registry data, 21 pre- or at-deployment predictors comprising early adversity, social, clinical and demographic variables were used to predict psychiatric contacts (psychiatric diagnosis and/or use of psychotropic medicine) occurring within 6.5 years after homecoming. Four supervised ML methods (penalized logistic regression, random forests, support vector machines and gradient boosting machines) were compared in ability to classify those with high risk of post-deployment MHP and those without. RESULTS Of 27594 subjects, 2175 (8%) had a psychiatric contact. All four ML methods applied had performances well above chance (Area under the Receiver-operating Curve 0.62-0.68). Positive predictive value for the best model was 0.16. A range of pre-deployment factors were found to be predictive of post-deployment psychiatric contacts. CONCLUSIONS ML methods can be useful in early identification of soldiers with high risk of MPH in the years following their first deployment. However, performances were modest and positive predictive values were low, limiting the applicability of the models for pre-deployment screening. Future studies should include neurobiological data and deployment experiences to increase accuracy of the models.
Collapse
Affiliation(s)
- Lars R Nissen
- Research and Knowledge Centre, The Danish Veterans Centre, Ringsted, Denmark.
| | - Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Heraklion, Crete, Greece; Gnosis Data Analysis, Heraklion, Greece
| | - Kasper Eskelund
- Research and Knowledge Centre, The Danish Veterans Centre, Ringsted, Denmark
| | - Jaimie L Gradus
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Søren B Andersen
- Research and Knowledge Centre, The Danish Veterans Centre, Ringsted, Denmark
| | - Karen-Inge Karstoft
- Research and Knowledge Centre, The Danish Veterans Centre, Ringsted, Denmark; Department of Psychology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
59
|
Panagopoulou M, Karaglani M, Manolopoulos VG, Iliopoulos I, Tsamardinos I, Chatzaki E. Deciphering the Methylation Landscape in Breast Cancer: Diagnostic and Prognostic Biosignatures through Automated Machine Learning. Cancers (Basel) 2021; 13:1677. [PMID: 33918195 PMCID: PMC8037759 DOI: 10.3390/cancers13071677] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 03/23/2021] [Accepted: 03/31/2021] [Indexed: 12/24/2022] Open
Abstract
DNA methylation plays an important role in breast cancer (BrCa) pathogenesis and could contribute to driving its personalized management. We performed a complete bioinformatic analysis in BrCa whole methylome datasets, analyzed using the Illumina methylation 450 bead-chip array. Differential methylation analysis vs. clinical end-points resulted in 11,176 to 27,786 differentially methylated genes (DMGs). Innovative automated machine learning (AutoML) was employed to construct signatures with translational value. Three highly performing and low-feature-number signatures were built: (1) A 5-gene signature discriminating BrCa patients from healthy individuals (area under the curve (AUC): 0.994 (0.982-1.000)). (2) A 3-gene signature identifying BrCa metastatic disease (AUC: 0.986 (0.921-1.000)). (3) Six equivalent 5-gene signatures diagnosing early disease (AUC: 0.973 (0.920-1.000)). Validation in independent patient groups verified performance. Bioinformatic tools for functional analysis and protein interaction prediction were also employed. All protein encoding features included in the signatures were associated with BrCa-related pathways. Functional analysis of DMGs highlighted the regulation of transcription as the main biological process, the nucleus as the main cellular component and transcription factor activity and sequence-specific DNA binding as the main molecular functions. Overall, three high-performance diagnostic/prognostic signatures were built and are readily available for improving BrCa precision management upon prospective clinical validation. Revisiting archived methylomes through novel bioinformatic approaches revealed significant clarifying knowledge for the contribution of gene methylation events in breast carcinogenesis.
Collapse
Affiliation(s)
- Maria Panagopoulou
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (M.P.); (M.K.); (V.G.M.)
| | - Makrina Karaglani
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (M.P.); (M.K.); (V.G.M.)
| | - Vangelis G. Manolopoulos
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (M.P.); (M.K.); (V.G.M.)
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, GR-71003 Heraklion, Greece;
| | - Ioannis Tsamardinos
- JADBio, Gnosis Data Analysis PC, Science and Technology Park of Crete, GR-70013 Heraklion, Greece;
- Department of Computer Science, University of Crete, GR-70013 Heraklion, Greece
- Institute of Applied and Computational Mathematics, Foundation for Research and Technology–Hellas, GR-70013 Heraklion, Greece
| | - Ekaterini Chatzaki
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, GR-68100 Alexandroupolis, Greece; (M.P.); (M.K.); (V.G.M.)
- Institute of Agri-Food and Life Sciences, Hellenic Mediterranean University Research Centre, GR-71410 Heraklion, Greece
| |
Collapse
|
60
|
Polewko-Klim A, Mnich K, Rudnicki WR. Robust Data Integration Method for Classification of Biomedical Data. J Med Syst 2021; 45:45. [PMID: 33624190 PMCID: PMC7902598 DOI: 10.1007/s10916-021-01718-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 01/26/2021] [Indexed: 10/26/2022]
Abstract
We present a protocol for integrating two types of biological data - clinical and molecular - for more effective classification of patients with cancer. The proposed approach is a hybrid between early and late data integration strategy. In this hybrid protocol, the set of informative clinical features is extended by the classification results based on molecular data sets. The results are then treated as new synthetic variables. The hybrid protocol was applied to METABRIC breast cancer samples and TCGA urothelial bladder carcinoma samples. Various data types were used for clinical endpoint prediction: clinical data, gene expression, somatic copy number aberrations, RNA-Seq, methylation, and reverse phase protein array. The performance of the hybrid data integration was evaluated with a repeated cross validation procedure and compared with other methods of data integration: early integration and late integration via super learning. The hybrid method gave similar results to those obtained by the best of the tested variants of super learning. What is more, the hybrid method allowed for further sensitivity analysis and recursive feature elimination, which led to compact predictive models for cancer clinical endpoints. For breast cancer, the final model consists of eight clinical variables and two synthetic features obtained from molecular data. For urothelial bladder carcinoma, only two clinical features and one synthetic variable were necessary to build the best predictive model. We have shown that the inclusion of the synthetic variables based on the RNA expression levels and copy number alterations can lead to improved quality of prognostic tests. Thus, it should be considered for inclusion in wider medical practice.
Collapse
Affiliation(s)
- Aneta Polewko-Klim
- Institute of Computer Science, University of Bialystok, Bialystok, Poland
| | - Krzysztof Mnich
- Computational Center, University of Bialystok, Bialystok, Poland
| | - Witold R. Rudnicki
- Institute of Computer Science, University of Bialystok, Bialystok, Poland
- Computational Center, University of Bialystok, Bialystok, Poland
| |
Collapse
|
61
|
Marcos-Zambrano LJ, Karaduzovic-Hadziabdic K, Loncar Turukalo T, Przymus P, Trajkovik V, Aasmets O, Berland M, Gruca A, Hasic J, Hron K, Klammsteiner T, Kolev M, Lahti L, Lopes MB, Moreno V, Naskinova I, Org E, Paciência I, Papoutsoglou G, Shigdel R, Stres B, Vilne B, Yousef M, Zdravevski E, Tsamardinos I, Carrillo de Santa Pau E, Claesson MJ, Moreno-Indias I, Truu J. Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment. Front Microbiol 2021; 12:634511. [PMID: 33737920 PMCID: PMC7962872 DOI: 10.3389/fmicb.2021.634511] [Citation(s) in RCA: 126] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 02/01/2021] [Indexed: 12/19/2022] Open
Abstract
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
Collapse
Affiliation(s)
- Laura Judith Marcos-Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | | | | | - Piotr Przymus
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruń, Poland
| | - Vladimir Trajkovik
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
| | - Oliver Aasmets
- Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
- Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Magali Berland
- Université Paris-Saclay, INRAE, MGP, Jouy-en-Josas, France
| | - Aleksandra Gruca
- Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
| | - Jasminka Hasic
- University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
| | - Karel Hron
- Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia
| | | | - Mikhail Kolev
- South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Marta B. Lopes
- NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal
- Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal
| | - Victor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO)Barcelona, Spain
- Colorectal Cancer Group, Institut de Recerca Biomedica de Bellvitge (IDIBELL), Barcelona, Spain
- Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Barcelona, Spain
- Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
| | - Irina Naskinova
- South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
| | - Elin Org
- Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
| | - Inês Paciência
- EPIUnit – Instituto de Saúde Pública da Universidade do Porto, Porto, Portugal
| | | | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Blaz Stres
- Group for Microbiology and Microbial Biotechnology, Department of Animal Science, University of Ljubljana, Ljubljana, Slovenia
| | - Baiba Vilne
- Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
| | - Eftim Zdravevski
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
| | | | | | - Marcus J. Claesson
- School of Microbiology & APC Microbiome Ireland, University College Cork, Cork, Ireland
| | - Isabel Moreno-Indias
- Unidad de Gestión Clínica de Endocrinología y Nutrición, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospital Clínico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain
- Centro de Investigación Biomédica en Red de Fisiopatología de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| |
Collapse
|
62
|
Circulating Cell-Free DNA in Breast Cancer: Searching for Hidden Information towards Precision Medicine. Cancers (Basel) 2021; 13:cancers13040728. [PMID: 33578793 PMCID: PMC7916622 DOI: 10.3390/cancers13040728] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 02/05/2021] [Accepted: 02/08/2021] [Indexed: 12/24/2022] Open
Abstract
Simple Summary Our research focuses in the elucidation of the nature of circulating cell-free DNA (ccfDNA) as a biological entity and its exploitation as a liquid biopsy biomaterial. Working on breast cancer, it became clear that although a promising biosource, its clinical exploitation is burdened mainly by gaps in knowledge about its biology and specific characteristics. The current review covers multiple aspects of ccfDNA in breast cancer. We cover key issues such as quantity, integrity, releasing structures, methylation specific changes, release mechanisms, biological role. Machine learning approaches for analyzing ccfDNA-generated data to produce classifiers for clinical use are also discussed. Abstract Breast cancer (BC) is a leading cause of death between women. Mortality is significantly raised due to drug resistance and metastasis, while personalized treatment options are obstructed by the limitations of conventional biopsy follow-up. Lately, research is focusing on circulating biomarkers as minimally invasive choices for diagnosis, prognosis and treatment monitoring. Circulating cell-free DNA (ccfDNA) is a promising liquid biopsy biomaterial of great potential as it is thought to mirror the tumor’s lifespan; however, its clinical exploitation is burdened mainly by gaps in knowledge of its biology and specific characteristics. The current review aims to gather latest findings about the nature of ccfDNA and its multiple molecular and biological characteristics in breast cancer, covering basic and translational research and giving insights about its validity in a clinical setting.
Collapse
|
63
|
Jorner K, Brinck T, Norrby PO, Buttar D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem Sci 2021; 12:1163-1175. [PMID: 36299676 PMCID: PMC9528810 DOI: 10.1039/d0sc04896h] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/02/2020] [Indexed: 12/19/2022] Open
Abstract
Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol-1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100-150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints.
Collapse
Affiliation(s)
- Kjell Jorner
- Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca Macclesfield UK
| | - Tore Brinck
- Applied Physical Chemistry, Department of Chemistry, CBH, KTH Royal Institute of Technology Stockholm Sweden
| | - Per-Ola Norrby
- Data Science & Modelling, Pharmaceutical Sciences, R&D, AstraZeneca Gothenburg Sweden
| | - David Buttar
- Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca Macclesfield UK
| |
Collapse
|
64
|
Mehdipour Ghazi M, Nielsen M, Pai A, Modat M, Jorge Cardoso M, Ourselin S, Sørensen L. Robust parametric modeling of Alzheimer's disease progression. Neuroimage 2021; 225:117460. [PMID: 33075562 PMCID: PMC9068750 DOI: 10.1016/j.neuroimage.2020.117460] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 10/11/2020] [Accepted: 10/12/2020] [Indexed: 11/30/2022] Open
Abstract
Quantitative characterization of disease progression using longitudinal data can provide long-term predictions for the pathological stages of individuals. This work studies the robust modeling of Alzheimer's disease progression using parametric methods. The proposed method linearly maps the individual's age to a disease progression score (DPS) and jointly fits constrained generalized logistic functions to the longitudinal dynamics of biomarkers as functions of the DPS using M-estimation. Robustness of the estimates is quantified using bootstrapping via Monte Carlo resampling, and the estimated inflection points of the fitted functions are used to temporally order the modeled biomarkers in the disease course. Kernel density estimation is applied to the obtained DPSs for clinical status classification using a Bayesian classifier. Different M-estimators and logistic functions, including a novel type proposed in this study, called modified Stannard, are evaluated on the data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) for robust modeling of volumetric magnetic resonance imaging (MRI) and positron emission tomography (PET) biomarkers, cerebrospinal fluid (CSF) measurements, as well as cognitive tests. The results show that the modified Stannard function fitted using the logistic loss achieves the best modeling performance with an average normalized mean absolute error (NMAE) of 0.991 across all biomarkers and bootstraps. Applied to the ADNI test set, this model achieves a multiclass area under the ROC curve (AUC) of 0.934 in clinical status classification. The obtained results for the proposed model outperform almost all state-of-the-art results in predicting biomarker values and classifying clinical status. Finally, the experiments show that the proposed model, trained using abundant ADNI data, generalizes well to data from the National Alzheimer's Coordinating Center (NACC) with an average NMAE of 1.182 and a multiclass AUC of 0.929.
Collapse
Affiliation(s)
- Mostafa Mehdipour Ghazi
- Biomediq A/S, Copenhagen, DK; Cerebriu A/S, Copenhagen, DK; Department of Computer Science, University of Copenhagen, Copenhagen, DK; Department of Medical Physics and Biomedical Engineering, University College London, London, UK.
| | - Mads Nielsen
- Biomediq A/S, Copenhagen, DK; Cerebriu A/S, Copenhagen, DK; Department of Computer Science, University of Copenhagen, Copenhagen, DK
| | - Akshay Pai
- Biomediq A/S, Copenhagen, DK; Cerebriu A/S, Copenhagen, DK; Department of Computer Science, University of Copenhagen, Copenhagen, DK
| | - Marc Modat
- Department of Medical Physics and Biomedical Engineering, University College London, London, UK; School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| | - M Jorge Cardoso
- Department of Medical Physics and Biomedical Engineering, University College London, London, UK; School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| | - Sébastien Ourselin
- Department of Medical Physics and Biomedical Engineering, University College London, London, UK; School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| | - Lauge Sørensen
- Biomediq A/S, Copenhagen, DK; Cerebriu A/S, Copenhagen, DK; Department of Computer Science, University of Copenhagen, Copenhagen, DK
| |
Collapse
|
65
|
Zăvoianu AC, Lughofer E, Pollak R, Eitzinger C, Radauer T. A soft-computing framework for automated optimization of multiple product quality criteria with application to micro-fluidic chip production. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
66
|
Kulessa M, Wittelsbach B, Loza Mencía E, Fürnkranz J. Sum-Product Networks for Early Outbreak Detection of Emerging Diseases. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-77211-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
67
|
The compatibility of theoretical frameworks with machine learning analyses in psychological research. Curr Opin Psychol 2020; 36:83-88. [DOI: 10.1016/j.copsyc.2020.05.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Revised: 05/12/2020] [Accepted: 05/13/2020] [Indexed: 12/29/2022]
|
68
|
Ho SY, Phua K, Wong L, Bin Goh WW. Extensions of the External Validation for Checking Learned Model Interpretability and Generalizability. PATTERNS (NEW YORK, N.Y.) 2020; 1:100129. [PMID: 33294870 PMCID: PMC7691387 DOI: 10.1016/j.patter.2020.100129] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
We discuss the validation of machine learning models, which is standard practice in determining model efficacy and generalizability. We argue that internal validation approaches, such as cross-validation and bootstrap, cannot guarantee the quality of a machine learning model due to potentially biased training data and the complexity of the validation procedure itself. For better evaluating the generalization ability of a learned model, we suggest leveraging on external data sources from elsewhere as validation datasets, namely external validation. Due to the lack of research attractions on external validation, especially a well-structured and comprehensive study, we discuss the necessity for external validation and propose two extensions of the external validation approach that may help reveal the true domain-relevant model from a candidate set. Moreover, we also suggest a procedure to check whether a set of validation datasets is valid and introduce statistical reference points for detecting external data problems.
Collapse
Affiliation(s)
- Sung Yang Ho
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Kimberly Phua
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore 117417, Singapore
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| |
Collapse
|
69
|
Karaglani M, Gourlia K, Tsamardinos I, Chatzaki E. Accurate Blood-Based Diagnostic Biosignatures for Alzheimer's Disease via Automated Machine Learning. J Clin Med 2020; 9:E3016. [PMID: 32962113 PMCID: PMC7563988 DOI: 10.3390/jcm9093016] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 09/04/2020] [Accepted: 09/14/2020] [Indexed: 12/17/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of neurodegenerative dementia and its timely diagnosis remains a major challenge in biomarker discovery. In the present study, we analyzed publicly available high-throughput low-sample -omics datasets from studies in AD blood, by the AutoML technology Just Add Data Bio (JADBIO), to construct accurate predictive models for use as diagnostic biosignatures. Considering data from AD patients and age-sex matched cognitively healthy individuals, we produced three best performing diagnostic biosignatures specific for the presence of AD: A. A 506-feature transcriptomic dataset from 48 AD and 22 controls led to a miRNA-based biosignature via Support Vector Machines with three miRNA predictors (AUC 0.975 (0.906, 1.000)), B. A 38,327-feature transcriptomic dataset from 134 AD and 100 controls led to six mRNA-based statistically equivalent signatures via Classification Random Forests with 25 mRNA predictors (AUC 0.846 (0.778, 0.905)) and C. A 9483-feature proteomic dataset from 25 AD and 37 controls led to a protein-based biosignature via Ridge Logistic Regression with seven protein predictors (AUC 0.921 (0.849, 0.972)). These performance metrics were also validated through the JADBIO pipeline confirming stability. In conclusion, using the automated machine learning tool JADBIO, we produced accurate predictive biosignatures extrapolating available low sample -omics data. These results offer options for minimally invasive blood-based diagnostic tests for AD, awaiting clinical validation based on respective laboratory assays. They also highlight the value of AutoML in biomarker discovery.
Collapse
Affiliation(s)
- Makrina Karaglani
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, 68100 Alexandroupolis, Greece;
- Gnosis Data Analysis PC, Science and Technology Park of Crete, N. Plastira 100, GR-700 13 Vassilika Vouton, Greece;
| | - Krystallia Gourlia
- Department of Computer Science, University of Crete, GR-700 13 Vassilika Vouton, Greece;
| | - Ioannis Tsamardinos
- Gnosis Data Analysis PC, Science and Technology Park of Crete, N. Plastira 100, GR-700 13 Vassilika Vouton, Greece;
- Department of Computer Science, University of Crete, GR-700 13 Vassilika Vouton, Greece;
- Institute of Applied and Computational Mathematics, Foundation for Research and Technology Hellas, GR-700 13 Vassilika Vouton, Greece
| | - Ekaterini Chatzaki
- Laboratory of Pharmacology, Medical School, Democritus University of Thrace, 68100 Alexandroupolis, Greece;
- Institute of Agri-Food and Life Sciences, University Research Centre, Hellenic Mediterranean University, GR-71410 Heraklion, Greece
| |
Collapse
|
70
|
Karstoft KI, Tsamardinos I, Eskelund K, Andersen SB, Nissen LR. Applicability of an Automated Model and Parameter Selection in the Prediction of Screening-Level PTSD in Danish Soldiers Following Deployment: Development Study of Transferable Predictive Models Using Automated Machine Learning. JMIR Med Inform 2020; 8:e17119. [PMID: 32706722 PMCID: PMC7407253 DOI: 10.2196/17119] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 03/30/2020] [Accepted: 04/16/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Posttraumatic stress disorder (PTSD) is a relatively common consequence of deployment to war zones. Early postdeployment screening with the aim of identifying those at risk for PTSD in the years following deployment will help deliver interventions to those in need but have so far proved unsuccessful. OBJECTIVE This study aimed to test the applicability of automated model selection and the ability of automated machine learning prediction models to transfer across cohorts and predict screening-level PTSD 2.5 years and 6.5 years after deployment. METHODS Automated machine learning was applied to data routinely collected 6-8 months after return from deployment from 3 different cohorts of Danish soldiers deployed to Afghanistan in 2009 (cohort 1, N=287 or N=261 depending on the timing of the outcome assessment), 2010 (cohort 2, N=352), and 2013 (cohort 3, N=232). RESULTS Models transferred well between cohorts. For screening-level PTSD 2.5 and 6.5 years after deployment, random forest models provided the highest accuracy as measured by area under the receiver operating characteristic curve (AUC): 2.5 years, AUC=0.77, 95% CI 0.71-0.83; 6.5 years, AUC=0.78, 95% CI 0.73-0.83. Linear models performed equally well. Military rank, hyperarousal symptoms, and total level of PTSD symptoms were highly predictive. CONCLUSIONS Automated machine learning provided validated models that can be readily implemented in future deployment cohorts in the Danish Defense with the aim of targeting postdeployment support interventions to those at highest risk for developing PTSD, provided the cohorts are deployed on similar missions.
Collapse
Affiliation(s)
- Karen-Inge Karstoft
- Research and Knowledge Centre, The Danish Veterans Centre, Ringsted, Denmark.,Department of Psychology, University of Copenhagen, Copenhagen, Denmark
| | - Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Heraklion, Crete, Greece.,Gnosis Data Analysis PC, Heraklion, Greece
| | - Kasper Eskelund
- Research and Knowledge Centre, The Danish Veterans Centre, Ringsted, Denmark.,Department of Military Psychology, The Danish Veterans Centre, Copenhagen, Denmark
| | - Søren Bo Andersen
- Research and Knowledge Centre, The Danish Veterans Centre, Ringsted, Denmark
| | | |
Collapse
|
71
|
Multiparametric MRI for Prostate Cancer Characterization: Combined Use of Radiomics Model with PI-RADS and Clinical Parameters. Cancers (Basel) 2020; 12:cancers12071767. [PMID: 32630787 PMCID: PMC7407326 DOI: 10.3390/cancers12071767] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 06/24/2020] [Accepted: 06/30/2020] [Indexed: 12/25/2022] Open
Abstract
Radiomics is an emerging field of image analysis with potential applications in patient risk stratification. This study developed and evaluated machine learning models using quantitative radiomic features extracted from multiparametric magnetic resonance imaging (mpMRI) to detect and classify prostate cancer (PCa). In total, 191 patients that underwent prostatic mpMRI and combined targeted and systematic fusion biopsy were retrospectively included. Segmentations of the whole prostate glands and index lesions were performed manually in apparent diffusion coefficient (ADC) maps and T2-weighted MRI. Radiomic features were extracted from regions corresponding to the whole prostate gland and index lesion. The best performing combination of feature setup and classifier was selected to compare its predictive ability of the radiologist’s evaluation (PI-RADS), mean ADC, prostate specific antigen density (PSAD) and digital rectal examination (DRE) using receiver operating characteristic (ROC) analysis. Models were evaluated using repeated 5-fold cross-validation and a separate independent test cohort. In the test cohort, an ensemble model combining a radiomics model, with models for PI-RADS, PSAD and DRE achieved high predictive AUCs for the differentiation of (i) malignant from benign prostatic lesions (AUC = 0.889) and of (ii) clinically significant (csPCa) from clinically insignificant PCa (cisPCa) (AUC = 0.844). Our combined model was numerically superior to PI-RADS for cancer detection (AUC = 0.779; p = 0.054) as well as for clinical significance prediction (AUC = 0.688; p = 0.209) and showed a significantly better performance compared to mADC for csPCa prediction (AUC = 0.571; p = 0.022). In our study, radiomics accurately characterizes prostatic index lesions and shows performance comparable to radiologists for PCa characterization. Quantitative image data represent a potential biomarker, which, when combined with PI-RADS, PSAD and DRE, predicts csPCa more accurately than mADC. Prognostic machine learning models could assist in csPCa detection and patient selection for MRI-guided biopsy.
Collapse
|
72
|
NTAL is associated with treatment outcome, cell proliferation and differentiation in acute promyelocytic leukemia. Sci Rep 2020; 10:10315. [PMID: 32587277 PMCID: PMC7316767 DOI: 10.1038/s41598-020-66223-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 05/15/2020] [Indexed: 01/04/2023] Open
Abstract
Non-T cell activation linker (NTAL) is a lipid raft-membrane protein expressed by normal and leukemic cells and involved in cell signaling. In acute promyelocytic leukemia (APL), NTAL depletion from lipid rafts decreases cell viability through regulation of the Akt/PI3K pathway. The role of NTAL in APL cell processes, and its association with clinical outcome, has not, however, been established. Here, we show that reduced levels of NTAL were associated with increased all-trans retinoic acid (ATRA)-induced differentiation, generation of reactive oxygen species, and mitochondrial dysfunction. Additionally, NTAL-knockdown (NTAL-KD) in APL cell lines led to activation of Ras, inhibition of Akt/mTOR pathways, and increased expression of autophagy markers, leading to an increased apoptosis rate following arsenic trioxide treatment. Furthermore, NTAL-KD in NB4 cells decreased the tumor burden in (NOD scid gamma) NSG mice, suggesting its implication in tumor growth. A retrospective analysis of NTAL expression in a cohort of patients treated with ATRA and anthracyclines, revealed that NTAL overexpression was associated with a high leukocyte count (P = 0.007) and was independently associated with shorter overall survival (Hazard Ratio: 3.6; 95% Confidence Interval: 1.17-11.28; P = 0.026). Taken together, our data highlights the importance of NTAL in APL cell survival and response to treatment.
Collapse
|
73
|
Montesanto A, D'Aquila P, Lagani V, Paparazzo E, Geracitano S, Formentini L, Giacconi R, Cardelli M, Provinciali M, Bellizzi D, Passarino G. A New Robust Epigenetic Model for Forensic Age Prediction. J Forensic Sci 2020; 65:1424-1431. [PMID: 32453457 DOI: 10.1111/1556-4029.14460] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 04/22/2020] [Accepted: 05/04/2020] [Indexed: 12/12/2022]
Abstract
Forensic DNA phenotyping refers to an emerging field of forensic sciences aimed at the prediction of externally visible characteristics of unknown sample donors directly from biological materials. The aging process significantly affects most of the above characteristics making the development of a reliable method of age prediction very important. Today, the so-called "epigenetic clocks" represent the most accurate models for age prediction. Since they are technically not achievable in a typical forensic laboratory, forensic DNA technology has triggered efforts toward the simplification of these models. The present study aimed to build an epigenetic clock using a set of methylation markers of five different genes in a sample of the Italian population of different ages covering the whole span of adult life. In a sample of 330 subjects, 42 selected markers were analyzed with a machine learning approach for building a prediction model for age prediction. A ridge linear regression model including eight of the proposed markers was identified as the best performing model across a plethora of candidates. This model was tested on an independent sample of 83 subjects providing a median error of 4.5 years. In the present study, an epigenetic model for age prediction was validated in a sample of the Italian population. However, its applicability to advanced ages still represents the main limitation in forensic caseworks.
Collapse
Affiliation(s)
- Alberto Montesanto
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| | - Patrizia D'Aquila
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| | - Vincenzo Lagani
- Gnosis Data Analysis PC, Heraklion, GR700-13, Greece.,Institute of Chemical Biology, Ilia State University, Tbilisi, 0162, Georgia
| | - Ersilia Paparazzo
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| | - Silvana Geracitano
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| | - Laura Formentini
- Advanced Technology Center for Aging Research, Scientific Technological Area, IRCCS INRCA, Ancona, Italy
| | - Robertina Giacconi
- Advanced Technology Center for Aging Research, Scientific Technological Area, IRCCS INRCA, Ancona, Italy
| | - Maurizio Cardelli
- Advanced Technology Center for Aging Research, Scientific Technological Area, IRCCS INRCA, Ancona, Italy
| | - Mauro Provinciali
- Advanced Technology Center for Aging Research, Scientific Technological Area, IRCCS INRCA, Ancona, Italy
| | - Dina Bellizzi
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| | - Giuseppe Passarino
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| |
Collapse
|
74
|
Multi-classifier prediction of knee osteoarthritis progression from incomplete imbalanced longitudinal data. Sci Rep 2020; 10:8427. [PMID: 32439879 PMCID: PMC7242357 DOI: 10.1038/s41598-020-64643-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 04/20/2020] [Indexed: 12/22/2022] Open
Abstract
Conventional inclusion criteria used in osteoarthritis clinical trials are not very effective in selecting patients who would benefit from a therapy being tested. Typically majority of selected patients show no or limited disease progression during a trial period. As a consequence, the effect of the tested treatment cannot be observed, and the efforts and resources invested in running the trial are not rewarded. This could be avoided, if selection criteria were more predictive of the future disease progression. In this article, we formulated the patient selection problem as a multi-class classification task, with classes based on clinically relevant measures of progression (over a time scale typical for clinical trials). Using data from two long-term knee osteoarthritis studies OAI and CHECK, we tested multiple algorithms and learning process configurations (including multi-classifier approaches, cost-sensitive learning, and feature selection), to identify the best performing machine learning models. We examined the behaviour of the best models, with respect to prediction errors and the impact of used features, to confirm their clinical relevance. We found that the model-based selection outperforms the conventional inclusion criteria, reducing by 20-25% the number of patients who show no progression. This result might lead to more efficient clinical trials.
Collapse
|
75
|
Aragon S, Khojasteh M, Boykin M, Crumpton B, McGuinn L, Gesell S. Challenging a Fundamental Proposition of Patient-Centeredness. JOURNAL OF BEST PRACTICES IN HEALTH PROFESSIONS DIVERSITY : RESEARCH, EDUCATION AND POLICY 2020; 13:94-119. [PMID: 35310827 PMCID: PMC8929671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This investigation challenged the proposition that physician patient-centeredness influences patients' experience-of-care (PEC). A theory-driven, three-factor, multigroup structural equation modeling design, using asymptotic-distribution-free and bootstrap estimation, with two national random and 5,000 bootstrap samples challenged the proposition's plausibility, measurement invariance, replicability, robustness against a competing model, and coherence with theory. The model fit [χ2(39) = 28, p =.900, RMSEA = .001, p = 1.00, CFI = 1.00], explaining 81 percent of PEC's variance; the proposition was invariant across samples, held against the competing model [χ2Δ(7) = 7.82, p = .97]; cross-validated against estimates from the 5,000 bootstrap samples; and agreed with theory. One standardized increase in patient-centeredness increased PEC, likelihood of recommending, and care ratings by .807, .765, and .771. Results converged in sustaining the plausibility of the proposition.
Collapse
Affiliation(s)
- Stephen Aragon
- Department of Healthcare Management, Winston-Salem State University, Winston-Salem, North Carolina
| | - Mak Khojasteh
- Department of Marketing and Management, Winston-Salem State University, Winston-Salem, North Carolina
| | - Montrale Boykin
- Department of Healthcare Management, Winston-Salem State University, Winston-Salem, North Carolina
| | - Breanne Crumpton
- C.G. O’Kelly Library, Winston-Salem State University, Winston-Salem, North Carolina
| | - Laura McGuinn
- Department of Pediatrics, Division of Developmental and Behavioral Pediatrics, University of Alabama at Birmingham, Birmingham, Alabama
| | - Sabina Gesell
- Departments of Public Health Sciences and the Maya Angelou Center for Health Equity, Wake Forest University School of Medicine, Winston-Salem, North Carolina
| |
Collapse
|
76
|
Appice A, Tsoumakas G, Manolopoulos Y, Matwin S. Pathway Activity Score Learning for Dimensionality Reduction of Gene Expression Data. DISCOVERY SCIENCE 2020. [PMCID: PMC7556388 DOI: 10.1007/978-3-030-61527-7_17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Abstract
Molecular gene-expression datasets consist of samples with tens of thousands of measured quantities (e.g., high dimensional data). However, there exist lower-dimensional representations that retain the useful information. We present a novel algorithm for such dimensionality reduction called Pathway Activity Score Learning (PASL). The major novelty of PASL is that the constructed features directly correspond to known molecular pathways and can be interpreted as pathway activity scores. Hence, unlike PCA and similar methods, PASL’s latent space has a relatively straight-forward biological interpretation. As a use-case, PASL is applied on two collections of breast cancer and leukemia gene expression datasets. We show that PASL does retain the predictive information for disease classification on new, unseen datasets, as well as outperforming PLIER, a recently proposed competitive method. We also show that differential activation pathway analysis provides complementary information to standard gene set enrichment analysis. The code is available at https://github.com/mensxmachina/PASL.
Collapse
|
77
|
Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J. Bootstrap Bias Corrected Cross Validation Applied to Super Learning. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7304018 DOI: 10.1007/978-3-030-50420-5_41] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Super learner algorithm can be applied to combine results of multiple base learners to improve quality of predictions. The default method for verification of super learner results is by nested cross validation; however, this technique is very expensive computationally. It has been proposed by Tsamardinos et al., that nested cross validation can be replaced by resampling for tuning hyper-parameters of the learning algorithms. The main contribution of this study is to apply this idea to verification of super learner. We compare the new method with other verification methods, including nested cross validation. Tests were performed on artificial data sets of diverse size and on seven real, biomedical data sets. The resampling method, called Bootstrap Bias Correction, proved to be a reasonably precise and very cost-efficient alternative for nested cross validation.
Collapse
|
78
|
Abstract
Currency crises are major events in the international monetary system. They affect the monetary policy of countries and are associated with risks of vulnerability for open economies. Much research has been carried out on the behavior of these events, and models have been developed to predict falls in the value of currencies. However, the limitations of existing models mean further research is required in this area, since the models are still of limited accuracy and have only been developed for emerging countries. This article presents an innovative global model for predicting currency crises. The analysis is geographically differentiated for regions, considering both emerging and developed countries and can accurately estimate future scenarios for currency crises at the global level. It uses a sample of 162 countries making it possible to account for the regional heterogeneity of the warning indicators. The method used was deep neural decision trees (DNDTs), a technique based on decision trees implemented by deep learning neural networks, which was compared with other methodologies widely applied in prediction. Our model has significant potential for the adaptation of macroeconomic policy to the risks derived from falls in the value of currencies, providing tools that help ensure financial stability at the global level.
Collapse
|
79
|
Circulating cell-free DNA in breast cancer: size profiling, levels, and methylation patterns lead to prognostic and predictive classifiers. Oncogene 2019; 38:3387-3401. [PMID: 30643192 DOI: 10.1038/s41388-018-0660-y] [Citation(s) in RCA: 91] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Revised: 11/11/2018] [Accepted: 12/07/2018] [Indexed: 12/24/2022]
Abstract
Blood circulating cell-free DNA (ccfDNA) is a suggested biosource of valuable clinical information for cancer, meeting the need for a minimally-invasive advancement in the route of precision medicine. In this paper, we evaluated the prognostic and predictive potential of ccfDNA parameters in early and advanced breast cancer. Groups consisted of 150 and 16 breast cancer patients under adjuvant and neoadjuvant therapy respectively, 34 patients with metastatic disease and 35 healthy volunteers. Direct quantification of ccfDNA in plasma revealed elevated concentrations correlated to the incidence of death, shorter PFS, and non-response to pharmacotherapy in the metastatic but not in the other groups. The methylation status of a panel of cancer-related genes chosen based on previous expression and epigenetic data (KLK10, SOX17, WNT5A, MSH2, GATA3) was assessed by quantitative methylation-specific PCR. All but the GATA3 gene was more frequently methylated in all the patient groups than in healthy individuals (all p < 0.05). The methylation of WNT5A was statistically significantly correlated to greater tumor size and poor prognosis characteristics and in advanced stage disease with shorter OS. In the metastatic group, also SOX17 methylation was significantly correlated to the incidence of death, shorter PFS, and OS. KLK10 methylation was significantly correlated to unfavorable clinicopathological characteristics and relapse, whereas in the adjuvant group to shorter DFI. Methylation of at least 3 or 4 genes was significantly correlated to shorter OS and no pharmacotherapy response, respectively. Classification analysis by a fully automated, machine learning software produced a single-parametric linear model using ccfDNA plasma concentration values, with great discriminating power to predict response to chemotherapy (AUC 0.803, 95% CI [0.606, 1.000]) in the metastatic group. Two more multi-parametric signatures were produced for the metastatic group, predicting survival and disease outcome. Finally, a multiple logistic regression model was constructed, discriminating between patient groups and healthy individuals. Overall, ccfDNA emerged as a highly potent predictive classifier in metastatic breast cancer. Upon prospective clinical evaluation, all the signatures produced could aid accurate prognosis.
Collapse
|
80
|
Adamou M, Antoniou G, Greasidou E, Lagani V, Charonyktakis P, Tsamardinos I, Doyle M. Toward Automatic Risk Assessment to Support Suicide Prevention. CRISIS 2018; 40:249-256. [PMID: 30474411 DOI: 10.1027/0227-5910/a000561] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Background: Suicide has been considered an important public health issue for years and is one of the main causes of death worldwide. Despite prevention strategies being applied, the rate of suicide has not changed substantially over the past decades. Suicide risk has proven extremely difficult to assess for medical specialists, and traditional methodologies deployed have been ineffective. Advances in machine learning make it possible to attempt to predict suicide with the analysis of relevant data aiming to inform clinical practice. Aims: We aimed to (a) test our artificial intelligence based, referral-centric methodology in the context of the National Health Service (NHS), (b) determine whether statistically relevant results can be derived from data related to previous suicides, and (c) develop ideas for various exploitation strategies. Method: The analysis used data of patients who died by suicide in the period 2013-2016 including both structured data and free-text medical notes, necessitating the deployment of state-of-the-art machine learning and text mining methods. Limitations: Sample size is a limiting factor for this study, along with the absence of non-suicide cases. Specific analytical solutions were adopted for addressing both issues. Results and Conclusion: The results of this pilot study indicate that machine learning shows promise for predicting within a specified period which people are most at risk of taking their own life at the time of referral to a mental health service.
Collapse
Affiliation(s)
- Marios Adamou
- 1 South West Yorkshire Partnership NHS Foundation Trust, Wakefield, UK.,2 Department of Computer Science, University of Huddersfield, UK
| | | | | | - Vincenzo Lagani
- 3 Gnosis Data Analysis PC, Heraklion, Greece.,5 Institute of Chemical Biology, Ilia State University, Tbilisi, Georgia
| | | | - Ioannis Tsamardinos
- 2 Department of Computer Science, University of Huddersfield, UK.,3 Gnosis Data Analysis PC, Heraklion, Greece.,4 Computer Science Department, University of Crete, Heraklion, Greece
| | - Michael Doyle
- 1 South West Yorkshire Partnership NHS Foundation Trust, Wakefield, UK
| |
Collapse
|