1
|
Karaglani M, Agorastos A, Panagopoulou M, Parlapani E, Athanasis P, Bitsios P, Tzitzikou K, Theodosiou T, Iliopoulos I, Bozikas VP, Chatzaki E. A novel blood-based epigenetic biosignature in first-episode schizophrenia patients through automated machine learning. Transl Psychiatry 2024; 14:257. [PMID: 38886359 PMCID: PMC11183091 DOI: 10.1038/s41398-024-02946-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 05/15/2024] [Accepted: 05/17/2024] [Indexed: 06/20/2024] Open
Abstract
Schizophrenia (SCZ) is a chronic, severe, and complex psychiatric disorder that affects all aspects of personal functioning. While SCZ has a very strong biological component, there are still no objective diagnostic tests. Lately, special attention has been given to epigenetic biomarkers in SCZ. In this study, we introduce a three-step, automated machine learning (AutoML)-based, data-driven, biomarker discovery pipeline approach, using genome-wide DNA methylation datasets and laboratory validation, to deliver a highly performing, blood-based epigenetic biosignature of diagnostic clinical value in SCZ. Publicly available blood methylomes from SCZ patients and healthy individuals were analyzed via AutoML, to identify SCZ-specific biomarkers. The methylation of the identified genes was then analyzed by targeted qMSP assays in blood gDNA of 30 first-episode drug-naïve SCZ patients and 30 healthy controls (CTRL). Finally, AutoML was used to produce an optimized disease-specific biosignature based on patient methylation data combined with demographics. AutoML identified a SCZ-specific set of novel gene methylation biomarkers including IGF2BP1, CENPI, and PSME4. Functional analysis investigated correlations with SCZ pathology. Methylation levels of IGF2BP1 and PSME4, but not CENPI were found to differ, IGF2BP1 being higher and PSME4 lower in the SCZ group as compared to the CTRL group. Additional AutoML classification analysis of our experimental patient data led to a five-feature biosignature including all three genes, as well as age and sex, that discriminated SCZ patients from healthy individuals [AUC 0.755 (0.636, 0.862) and average precision 0.758 (0.690, 0.825)]. In conclusion, this three-step pipeline enabled the discovery of three novel genes and an epigenetic biosignature bearing potential value as promising SCZ blood-based diagnostics.
Collapse
Affiliation(s)
- Makrina Karaglani
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, GR-68132, Alexandroupolis, Greece
- Institute of Agri-food and Life Sciences, University Research & Innovation Center, H.M.U.R.I.C., Hellenic Mediterranean University, GR-71003, Crete, Greece
| | - Agorastos Agorastos
- Institute of Agri-food and Life Sciences, University Research & Innovation Center, H.M.U.R.I.C., Hellenic Mediterranean University, GR-71003, Crete, Greece
- II. Department of Psychiatry, Faculty of Health Sciences, School of Medicine, Aristotle University of Thessaloniki, GR-56430, Thessaloniki, Greece
| | - Maria Panagopoulou
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, GR-68132, Alexandroupolis, Greece
- Institute of Agri-food and Life Sciences, University Research & Innovation Center, H.M.U.R.I.C., Hellenic Mediterranean University, GR-71003, Crete, Greece
| | - Eleni Parlapani
- Ι. Department of Psychiatry, Faculty of Health Sciences, School of Medicine, Aristotle University of Thessaloniki, GR-56429, Thessaloniki, Greece
| | - Panagiotis Athanasis
- II. Department of Psychiatry, Faculty of Health Sciences, School of Medicine, Aristotle University of Thessaloniki, GR-56430, Thessaloniki, Greece
| | - Panagiotis Bitsios
- Department of Psychiatry and Behavioral Sciences, Faculty of Medicine, University of Crete, GR-71500, Heraklion, Greece
| | - Konstantina Tzitzikou
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, GR-68132, Alexandroupolis, Greece
| | - Theodosis Theodosiou
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, GR-68132, Alexandroupolis, Greece
- ABCureD P.C, GR-68131, Alexandroupolis, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, School of Medicine, University of Crete, GR-71003, Heraklion, Greece
| | - Vasilios-Panteleimon Bozikas
- II. Department of Psychiatry, Faculty of Health Sciences, School of Medicine, Aristotle University of Thessaloniki, GR-56430, Thessaloniki, Greece
| | - Ekaterini Chatzaki
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, GR-68132, Alexandroupolis, Greece.
- Institute of Agri-food and Life Sciences, University Research & Innovation Center, H.M.U.R.I.C., Hellenic Mediterranean University, GR-71003, Crete, Greece.
- ABCureD P.C, GR-68131, Alexandroupolis, Greece.
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, 70013, Heraklion, Greece.
| |
Collapse
|
2
|
Han L, Xu Q, Meng P, Xu R, Nan J. Brain identification of IBS patients based on GBDT and multiple imaging techniques. Phys Eng Sci Med 2024; 47:651-662. [PMID: 38416373 DOI: 10.1007/s13246-024-01394-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 01/16/2024] [Indexed: 02/29/2024]
Abstract
The brain biomarker of irritable bowel syndrome (IBS) patients is still lacking. The study aims to explore a new technology studying the brain alterations of IBS patients based on multi-source brain data. In the study, a decision-level fusion method based on gradient boosting decision tree (GBDT) was proposed. Next, 100 healthy subjects were used to validate the effectiveness of the method. Finally, the identification of brain alterations and the pain evaluation in IBS patients were carried out by the fusion method based on the resting-state fMRI and DWI for 46 patients and 46 controls selected randomly from 100 healthy subjects. The results showed that the method can achieve good classification between IBS patients and controls (accuracy = 95%) and pain evaluation of IBS patients (mean absolute error = 0.1977). Moreover, both the gain-based and the permutation-based evaluation instead of statistical analysis showed that left cingulum bundle contributed most significantly to the classification, and right precuneus contributed most significantly to the evaluation of abdominal pain intensity in the IBS patients. The differences seem to suggest a probable but unexplored separation about the central regions between the identification and progression of IBS. This finding may provide one new thought and technology for brain alteration related to IBS.
Collapse
Affiliation(s)
- Li Han
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, 136 Science Avenue, Zhengzhou, 450000, Henan, China
| | - Qian Xu
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, 136 Science Avenue, Zhengzhou, 450000, Henan, China
| | - Panting Meng
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, 136 Science Avenue, Zhengzhou, 450000, Henan, China
| | - Ruyun Xu
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, 136 Science Avenue, Zhengzhou, 450000, Henan, China
| | - Jiaofen Nan
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, 136 Science Avenue, Zhengzhou, 450000, Henan, China.
| |
Collapse
|
3
|
Montesanto A, Lagani V, Spazzafumo L, Tortato E, Rosati S, Corsonello A, Soraci L, Sabbatinelli J, Cherubini A, Conte M, Capri M, Capalbo M, Lattanzio F, Olivieri F, Bonfigli AR. Physical performance strongly predicts all-cause mortality risk in a real-world population of older diabetic patients: machine learning approach for mortality risk stratification. Front Endocrinol (Lausanne) 2024; 15:1359482. [PMID: 38745954 PMCID: PMC11091327 DOI: 10.3389/fendo.2024.1359482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 04/12/2024] [Indexed: 05/16/2024] Open
Abstract
Background Prognostic risk stratification in older adults with type 2 diabetes (T2D) is important for guiding decisions concerning advance care planning. Materials and methods A retrospective longitudinal study was conducted in a real-world sample of older diabetic patients afferent to the outpatient facilities of the Diabetology Unit of the IRCCS INRCA Hospital of Ancona (Italy). A total of 1,001 T2D patients aged more than 70 years were consecutively evaluated by a multidimensional geriatric assessment, including physical performance evaluated using the Short Physical Performance Battery (SPPB). The mortality was assessed during a 5-year follow-up. We used the automatic machine-learning (AutoML) JADBio platform to identify parsimonious mathematical models for risk stratification. Results Of 977 subjects included in the T2D cohort, the mean age was 76.5 (SD: 4.5) years and 454 (46.5%) were men. The mean follow-up time was 53.3 (SD:15.8) months, and 209 (21.4%) patients died by the end of the follow-up. The JADBio AutoML final model included age, sex, SPPB, chronic kidney disease, myocardial ischemia, peripheral artery disease, neuropathy, and myocardial infarction. The bootstrap-corrected concordance index (c-index) for the final model was 0.726 (95% CI: 0.687-0.763) with SPPB ranked as the most important predictor. Based on the penalized Cox regression model, the risk of death per unit of time for a subject with an SPPB score lower than five points was 3.35 times that for a subject with a score higher than eight points (P-value <0.001). Conclusion Assessment of physical performance needs to be implemented in clinical practice for risk stratification of T2D older patients.
Collapse
Affiliation(s)
- Alberto Montesanto
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, Italy
| | - Vincenzo Lagani
- Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, Thuwal, Saudi Arabia
- Institute of Chemical Biology, Ilia State University, Tbilisi, Georgia
| | | | | | | | - Andrea Corsonello
- Unit of Geriatric Medicine, IRCCS INRCA, Cosenza, Italy
- Department of Pharmacy, Health and Nutritional Sciences, University of Calabria, Rende, Italy
| | - Luca Soraci
- Unit of Geriatric Medicine, IRCCS INRCA, Cosenza, Italy
| | - Jacopo Sabbatinelli
- Department of Clinical and Molecular Sciences, Università Politecnica delle Marche, Ancona, Italy
- Laboratory Medicine Unit, Azienda Ospedaliero Universitaria delle Marche, Ancona, Italy
| | - Antonio Cherubini
- Geriatria, Accettazione geriatrica e Centro di ricerca per l’invecchiamento, IRCCS INRCA, Ancona, Italy
| | - Maria Conte
- Department of Medical and Surgical Science, University of Bologna, Bologna, Italy
| | - Miriam Capri
- Department of Medical and Surgical Science, University of Bologna, Bologna, Italy
| | | | | | - Fabiola Olivieri
- Department of Clinical and Molecular Sciences, Università Politecnica delle Marche, Ancona, Italy
- Clinic of Laboratory and Precision Medicine, IRCCS INRCA, Ancona, Italy
| | | |
Collapse
|
4
|
Panagopoulou M, Karaglani M, Tzitzikou K, Kessari N, Arvanitidis K, Amarantidis K, Drosos GI, Gerou S, Papanas N, Papazoglou D, Baritaki S, Constantinidis TC, Chatzaki E. Mitochondrial Fraction of Circulating Cell-Free DNA as an Indicator of Human Pathology. Int J Mol Sci 2024; 25:4199. [PMID: 38673785 PMCID: PMC11050675 DOI: 10.3390/ijms25084199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/01/2024] [Accepted: 04/08/2024] [Indexed: 04/28/2024] Open
Abstract
Circulating cell-free DNA (ccfDNA) of mitochondrial origin (ccf-mtDNA) consists of a minor fraction of total ccfDNA in blood or in other biological fluids. Aberrant levels of ccf-mtDNA have been observed in many pathologies. Here, we introduce a simple and effective standardized Taqman probe-based dual-qPCR assay for the simultaneous detection and relative quantification of nuclear and mitochondrial fragments of ccfDNA. Three pathologies of major burden, one malignancy (Breast Cancer, BrCa), one inflammatory (Osteoarthritis, OA) and one metabolic (Type 2 Diabetes, T2D), were studied. Higher levels of ccf-mtDNA were detected both in BrCa and T2D in relation to health, but not in OA. In BrCa, hormonal receptor status was associated with ccf-mtDNA levels. Machine learning analysis of ccf-mtDNA datasets was used to build biosignatures of clinical relevance. (A) a three-feature biosignature discriminating between health and BrCa (AUC: 0.887) and a five-feature biosignature for predicting the overall survival of BrCa patients (Concordance Index: 0.756). (B) a five-feature biosignature stratifying among T2D, prediabetes and health (AUC: 0.772); a five-feature biosignature discriminating between T2D and health (AUC: 0.797); and a four-feature biosignature identifying prediabetes from health (AUC: 0.795). (C) a biosignature including total plasma ccfDNA with very high performance in discriminating OA from health (AUC: 0.934). Aberrant ccf-mtDNA levels could have diagnostic/prognostic potential in BrCa and Diabetes, while the developed multiparameter biosignatures can add value to their clinical management.
Collapse
Affiliation(s)
- Maria Panagopoulou
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece (K.T.)
- Institute of Agri-Food and Life Sciences, University Research and Innovation Centre, Hellenic Mediterranean University, 71003 Heraklion, Greece
| | - Makrina Karaglani
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece (K.T.)
- Institute of Agri-Food and Life Sciences, University Research and Innovation Centre, Hellenic Mediterranean University, 71003 Heraklion, Greece
| | - Konstantina Tzitzikou
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece (K.T.)
| | - Nikoleta Kessari
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece (K.T.)
| | - Konstantinos Arvanitidis
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece (K.T.)
- Institute of Agri-Food and Life Sciences, University Research and Innovation Centre, Hellenic Mediterranean University, 71003 Heraklion, Greece
| | - Kyriakos Amarantidis
- Clinic of Medical Oncology, Department of Medicine, Democritus University of Thrace, University General Hospital of Alexandroupolis, 68100 Alexandroupolis, Greece
| | - George I. Drosos
- Clinic of Orthopaedic Surgery, Department of Medicine, Democritus University of Thrace, University General Hospital of Alexandroupolis, 68100 Alexandroupolis, Greece
| | - Spyros Gerou
- Analysis Biopathological Diagnostic Research Laboratories, 54623 Thessaloniki, Greece
| | - Nikolaos Papanas
- Diabetes Centre, 2nd Department of Internal Medicine, University Hospital of Alexandroupolis, 68100 Alexandroupolis, Greece
| | - Dimitrios Papazoglou
- Diabetes Centre, 2nd Department of Internal Medicine, University Hospital of Alexandroupolis, 68100 Alexandroupolis, Greece
| | - Stavroula Baritaki
- Laboratory of Experimental Oncology, Division of Surgery, School of Medicine, University of Crete, 71500 Heraklion, Greece
| | - Theodoros C. Constantinidis
- Laboratory of Hygiene and Environmental Protection, Department of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece
| | - Ekaterini Chatzaki
- Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece (K.T.)
- Institute of Agri-Food and Life Sciences, University Research and Innovation Centre, Hellenic Mediterranean University, 71003 Heraklion, Greece
| |
Collapse
|
5
|
Biza K, Tsamardinos I, Triantafillou S. Out-of-Sample Tuning for Causal Discovery. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4963-4973. [PMID: 35830399 DOI: 10.1109/tnnls.2022.3185842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Causal discovery is continually being enriched with new algorithms for learning causal graphical probabilistic models. Each one of them requires a set of hyperparameters, creating a great number of combinations. Given that the true graph is unknown and the learning task is unsupervised, the challenge to a practitioner is how to tune these choices. We propose out-of-sample causal tuning (OCT) that aims to select an optimal combination. The method treats a causal model as a set of predictive models and uses out-of-sample protocols for supervised methods. This approach can handle general settings like latent confounders and nonlinear relationships. The method uses an information-theoretic approach to be able to generalize to mixed data types and a penalty for dense graphs to penalize for complexity. To evaluate OCT, we introduce a causal-based simulation method to create datasets that mimic the properties of real-world problems. We evaluate OCT against two other tuning approaches, based on stability and in-sample fitting. We show that OCT performs well in many experimental settings and it is an effective tuning method for causal discovery.
Collapse
|
6
|
Li L, Yang J, Por LY, Khan MS, Hamdaoui R, Hussain L, Iqbal Z, Rotaru IM, Dobrotă D, Aldrdery M, Omar A. Enhancing lung cancer detection through hybrid features and machine learning hyperparameters optimization techniques. Heliyon 2024; 10:e26192. [PMID: 38404820 PMCID: PMC10884486 DOI: 10.1016/j.heliyon.2024.e26192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/30/2024] [Accepted: 02/08/2024] [Indexed: 02/27/2024] Open
Abstract
Machine learning offers significant potential for lung cancer detection, enabling early diagnosis and potentially improving patient outcomes. Feature extraction remains a crucial challenge in this domain. Combining the most relevant features can further enhance detection accuracy. This study employed a hybrid feature extraction approach, which integrates both Gray-level co-occurrence matrix (GLCM) with Haralick and autoencoder features with an autoencoder. These features were subsequently fed into supervised machine learning methods. Support Vector Machine (SVM) Radial Base Function (RBF) and SVM Gaussian achieved perfect performance measures, while SVM polynomial produced an accuracy of 99.89% when utilizing GLCM with an autoencoder, Haralick, and autoencoder features. SVM Gaussian achieved an accuracy of 99.56%, while SVM RBF achieved an accuracy of 99.35% when utilizing GLCM with Haralick features. These results demonstrate the potential of the proposed approach for developing improved diagnostic and prognostic lung cancer treatment planning and decision-making systems.
Collapse
Affiliation(s)
- Liangyu Li
- Center for Software Technology and Management, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
- Health Informatics Laboratory, Cancer Research Institute, Chifeng Cancer Hospital (Second Affiliated Hospital of Chifeng University), Medical Department, Chifeng University, Chifeng City, Inner Mongolia Autonomous Region, 024000, China
| | - Jing Yang
- Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya, 50603, Kuala Lumpur, Malaysia
| | - Lip Yee Por
- Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya, 50603, Kuala Lumpur, Malaysia
| | - Mohammad Shahbaz Khan
- Children's National Hospital, 111 Michigan Ave NW, Washington, DC, 20010, United States
| | - Rim Hamdaoui
- Department of Computer Science, College of Science and Human Studies Dawadmi, Shaqra University, Shaqra, Riyadh, Saudi Arabia
| | - Lal Hussain
- Department of Computer Science and Information Technology, King Abdullah Campus Chatter Kalas, University of Azad Jammu and Kashmir, Muzaffarabad, 13100, Azad Kashmir, Pakistan
- Department of Computer Science and Information Technology, Neelum Campus, University of Azad Jammu and Kashmir, Athmuqam, 13230, Azad Kashmir, Pakistan
| | - Zahoor Iqbal
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China
| | - Ionela Magdalena Rotaru
- Department of Industrial Engineering and Management, Lucian Blaga University of Sibiu, Bulevardul Victoriei 10, Sibiu, 550024, Romania
| | - Dan Dobrotă
- Faculty of Engineering, Lucian Blaga University of Sibiu, Bulevardul Victoriei 10, Sibiu, 550024, Romania
| | - Moutaz Aldrdery
- Department of Chemical Engineering, College of Engineering, King Khalid University, Abha, 61411, Saudi Arabia
| | - Abdulfattah Omar
- Department of English, College of Science & Humanities, Prince Sattam Bin Abdulaziz University, Saudi Arabia
| |
Collapse
|
7
|
Litwińczuk MC, Muhlert N, Trujillo‐Barreto N, Woollams A. Impact of brain parcellation on prediction performance in models of cognition and demographics. Hum Brain Mapp 2024; 45:e26592. [PMID: 38339892 PMCID: PMC10831203 DOI: 10.1002/hbm.26592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 12/18/2023] [Accepted: 12/31/2023] [Indexed: 02/12/2024] Open
Abstract
Brain connectivity analysis begins with the selection of a parcellation scheme that will define brain regions as nodes of a network whose connections will be studied. Brain connectivity has already been used in predictive modelling of cognition, but it remains unclear if the resolution of the parcellation used can systematically impact the predictive model performance. In this work, structural, functional and combined connectivity were each defined with five different parcellation schemes. The resolution and modality of the parcellation schemes were varied. Each connectivity defined with each parcellation was used to predict individual differences in age, education, sex, executive function, self-regulation, language, encoding and sequence processing. It was found that low-resolution functional parcellation consistently performed above chance at producing generalisable models of both demographics and cognition. However, no single parcellation scheme showed a superior predictive performance across all cognitive domains and demographics. In addition, although parcellation schemes impacted the graph theory measures of each connectivity type (structural, functional and combined), these differences did not account for the out-of-sample predictive performance of the models. Taken together, these findings demonstrate that while high-resolution parcellations may be beneficial for modelling specific individual differences, partial voluming of signals produced by the higher resolution of the parcellation likely disrupts model generalisability.
Collapse
Affiliation(s)
| | - Nils Muhlert
- School of Health SciencesUniversity of ManchesterManchesterUK
| | | | - Anna Woollams
- School of Health SciencesUniversity of ManchesterManchesterUK
| |
Collapse
|
8
|
Wilimitis D, Walsh CG. Practical Considerations and Applied Examples of Cross-Validation for Model Development and Evaluation in Health Care: Tutorial. JMIR AI 2023; 2:e49023. [PMID: 38875530 PMCID: PMC11041453 DOI: 10.2196/49023] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 09/19/2023] [Accepted: 09/28/2023] [Indexed: 06/16/2024]
Abstract
Cross-validation remains a popular means of developing and validating artificial intelligence for health care. Numerous subtypes of cross-validation exist. Although tutorials on this validation strategy have been published and some with applied examples, we present here a practical tutorial comparing multiple forms of cross-validation using a widely accessible, real-world electronic health care data set: Medical Information Mart for Intensive Care-III (MIMIC-III). This tutorial explored methods such as K-fold cross-validation and nested cross-validation, highlighting their advantages and disadvantages across 2 common predictive modeling use cases: classification (mortality) and regression (length of stay). We aimed to provide readers with reproducible notebooks and best practices for modeling with electronic health care data. We also described sets of useful recommendations as we demonstrated that nested cross-validation reduces optimistic bias but comes with additional computational challenges. This tutorial might improve the community's understanding of these important methods while catalyzing the modeling community to apply these guides directly in their work using the published code.
Collapse
Affiliation(s)
- Drew Wilimitis
- Vanderbilt University Medical Center, Vanderbilt University, Nashville, TN, United States
| | - Colin G Walsh
- Vanderbilt University Medical Center, Vanderbilt University, Nashville, TN, United States
| |
Collapse
|
9
|
Thomaidis GV, Papadimitriou K, Michos S, Chartampilas E, Tsamardinos I. A characteristic cerebellar biosignature for bipolar disorder, identified with fully automatic machine learning. IBRO Neurosci Rep 2023; 15:77-89. [PMID: 38025660 PMCID: PMC10668096 DOI: 10.1016/j.ibneur.2023.06.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 05/19/2023] [Accepted: 06/29/2023] [Indexed: 12/01/2023] Open
Abstract
Background Transcriptomic profile differences between patients with bipolar disorder and healthy controls can be identified using machine learning and can provide information about the potential role of the cerebellum in the pathogenesis of bipolar disorder.With this aim, user-friendly, fully automated machine learning algorithms can achieve extremely high classification scores and disease-related predictive biosignature identification, in short time frames and scaled down to small datasets. Method A fully automated machine learning platform, based on the most suitable algorithm selection and relevant set of hyper-parameter values, was applied on a preprocessed transcriptomics dataset, in order to produce a model for biosignature selection and to classify subjects into groups of patients and controls. The parent GEO datasets were originally produced from the cerebellar and parietal lobe tissue of deceased bipolar patients and healthy controls, using Affymetrix Human Gene 1.0 ST Array. Results Patients and controls were classified into two separate groups, with no close-to-the-boundary cases, and this classification was based on the cerebellar transcriptomic biosignature of 25 features (genes), with Area Under Curve 0.929 and Average Precision 0.955. The biosignature includes both genes connected before to bipolar disorder, depression, psychosis or epilepsy, as well as genes not linked before with any psychiatric disease. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed participation of 4 identified features in 6 pathways which have also been associated with bipolar disorder. Conclusion Automated machine learning (AutoML) managed to identify accurately 25 genes that can jointly - in a multivariate-fashion - separate bipolar patients from healthy controls with high predictive power. The discovered features lead to new biological insights. Machine Learning (ML) analysis considers the features in combination (in contrast to standard differential expression analysis), removing both irrelevant as well as redundant markers, and thus, focusing to biological interpretation.
Collapse
Affiliation(s)
- Georgios V. Thomaidis
- Greek National Health System, Psychiatric Department, Katerini General Hospital, Katerini, Greece
| | - Konstantinos Papadimitriou
- Greek National Health System, G. Papanikolaou General Hospital, Organizational Unit - Psychiatric Hospital of Thessaloniki, Thessaloniki, Greece
| | | | - Evangelos Chartampilas
- Laboratory of Radiology, AHEPA General Hospital, University of Thessaloniki, Thessaloniki, Greece
| | | |
Collapse
|
10
|
Erion Barner LA, Gao G, Reddi DM, Lan L, Burke W, Mahmood F, Grady WM, Liu JTC. Artificial Intelligence-Triaged 3-Dimensional Pathology to Improve Detection of Esophageal Neoplasia While Reducing Pathologist Workloads. Mod Pathol 2023; 36:100322. [PMID: 37657711 DOI: 10.1016/j.modpat.2023.100322] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 07/25/2023] [Accepted: 08/25/2023] [Indexed: 09/03/2023]
Abstract
Early detection of esophageal neoplasia via evaluation of endoscopic surveillance biopsies is the key to maximizing survival for patients with Barrett's esophagus, but it is hampered by the sampling limitations of conventional slide-based histopathology. Comprehensive evaluation of whole biopsies with 3-dimensional (3D) pathology may improve early detection of malignancies, but large 3D pathology data sets are tedious for pathologists to analyze. Here, we present a deep learning-based method to automatically identify the most critical 2-dimensional (2D) image sections within 3D pathology data sets for pathologists to review. Our method first generates a 3D heatmap of neoplastic risk for each biopsy, then classifies all 2D image sections within the 3D data set in order of neoplastic risk. In a clinical validation study, we diagnose esophageal biopsies with artificial intelligence-triaged 3D pathology (3 images per biopsy) vs standard slide-based histopathology (16 images per biopsy) and show that our method improves detection sensitivity while reducing pathologist workloads.
Collapse
Affiliation(s)
| | - Gan Gao
- Department of Mechanical Engineering, University of Washington, Seattle, Washington
| | - Deepti M Reddi
- Department of Laboratory Medicine & Pathology, University of Washington School of Medicine, Seattle, Washington
| | - Lydia Lan
- Department of Mechanical Engineering, University of Washington, Seattle, Washington; Department of Biology, University of Washington, Seattle, Washington
| | - Wynn Burke
- Department of Laboratory Medicine & Pathology, University of Washington School of Medicine, Seattle, Washington; Department of Medicine (Gastroenterology Division), University of Washington School of Medicine, Seattle, Washington
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, Massachusetts; Harvard Data Science Initiative, Harvard University, Cambridge, Massachusetts
| | - William M Grady
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Jonathan T C Liu
- Department of Mechanical Engineering, University of Washington, Seattle, Washington; Department of Laboratory Medicine & Pathology, University of Washington School of Medicine, Seattle, Washington; Department of Bioengineering, University of Washington, Seattle, Washington.
| |
Collapse
|
11
|
Shahini E, Chaulagain N, Shankar K, Tang T. Predicting Free Energies of Exfoliation and Solvation for Graphitic Carbon Nitrides Using Machine Learning. ACS APPLIED MATERIALS & INTERFACES 2023; 15:53786-53801. [PMID: 37938813 DOI: 10.1021/acsami.3c09347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2023]
Abstract
As a metal-free and visible-light-responsive photocatalyst, graphitic carbon nitride (g-C3N4) has emerged as a new research hotspot and has attracted broad attention in the field of solar energy conversion and thin-film transistors. Liquid-phase exfoliation (LPE) is the best-known method for the synthesis of 2D g-C3N4 nanosheets. In LPE, bulk g-C3N4 is exfoliated in a solvent via high-shear mixing or sonication in order to produce a stable suspension of individual nanosheets. Two parameters of importance in gauging the performance of a solvent in LPE are the free energy required to exfoliate a unit area of layered materials into individual sheets in the solvent (ΔGexf) and the solvation free energy per unit area of a nanosheet (ΔGsol). While approximations for the free energies exist, they are shown in our previous work to be inaccurate and incapable of capturing the experimentally observed efficacy of LPE. Molecular dynamics (MD) simulations can provide accurate free-energy calculations, but doing so for every single solvent is time- and resource-consuming. Herein, machine learning (ML) algorithms are used to predict ΔGexf and ΔGsol for g-C3N4. First, a database for ΔGexf and ΔGsol is created based on a series of MD simulations involving 49 different solvents with distinct chemical structures and properties. The data set also includes values of critical descriptors for the solvents, including density, surface tension, dielectric constant, etc. Different ML methods are compared, accompanied by descriptor selection, to develop the most accurate model for predicting ΔGexf and ΔGsol. The extra tree regressor is shown to be the best performer among the six ML methods studied. Experimental validation of the model is conducted by performing dispersibility tests in several solvents for which the free energies are predicted. Finally, the influence of the selected descriptors on the free energies is analyzed, and strategies for solvent selection in LPE are proposed.
Collapse
Affiliation(s)
- Ehsan Shahini
- Department of Mechanical Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Narendra Chaulagain
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Karthik Shankar
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Tian Tang
- Department of Mechanical Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| |
Collapse
|
12
|
Su J, Zhang F, Yu C, Zhang Y, Wang J, Wang C, Wang H, Jiang H. Machine learning: Next promising trend for microplastics study. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 344:118756. [PMID: 37573697 DOI: 10.1016/j.jenvman.2023.118756] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/24/2023] [Accepted: 08/09/2023] [Indexed: 08/15/2023]
Abstract
Microplastics (MPs), as an emerging pollutant, pose a significant threat to humans and ecosystems. However, traditional MPs characterization methods are limited by sample requirements and characterization time. Machine Learning (ML) has emerged as a vital technology for analyzing MPs pollution due to its accuracy, broad application, and powerful feature extraction. Nevertheless, environmental scientists require threshold knowledge before using ML, restricting the ML application in MPs research. Furthermore, imbalanced development of ML in MPs research is a pressing concern. In order to achieve a wide ML application in MPs research, in this review, we comprehensively discussed the size and sources of MPs datasets in relevant literature to help environmental scientists deepen their understanding of the construction of MPs datasets. Commonly used ML algorithms are analyzed from the perspective of interpretability and the need for computer facilities. Additionally, methods for improving and evaluating ML model performance, such as dataset pre-processing, model optimization, and model assessment metrics, are discussed. According to datasets and characterization techniques, MPs identification using ML was divided into three categories in this work: spectral identification, image identification, and spectral imaging identification. Finally, other applications of ML in MPs studies, including toxicity analysis, pollutants adsorption, and microbial colonization, are comprehensively discussed, which reveals the great application potential of ML. Based on the discussion above, this review suggests an algorithm selection strategy to assist researchers in selecting the most suitable ML algorithm in different situations, improving efficiency and decreasing the costs of trial and error. We believe that this work sheds light on the application of ML in MPs study.
Collapse
Affiliation(s)
- Jiming Su
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, Hunan, PR China
| | - Fupeng Zhang
- Institute of Biopharmaceutical and Health Engineering, Tsinghua Shenzhen International Graduate School, Tsinghua University, 518055, Shenzhen, PR China
| | - Chuanxiu Yu
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, Hunan, PR China
| | - Yingshuang Zhang
- School of Chemical Engineering and Technology, Xinjiang University, 830017, Urumqi, Xinjiang, PR China
| | - Jianchao Wang
- School of Chemical and Environmental Engineering, China University of Mining and Technology (Beijing), Beijing, 100083, PR China
| | - Chongqing Wang
- School of Chemical Engineering, Zhengzhou University, Zhengzhou, 450001, PR China
| | - Hui Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, Hunan, PR China.
| | - Hongru Jiang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, Hunan, PR China.
| |
Collapse
|
13
|
Peisen F, Gerken A, Hering A, Dahm I, Nikolaou K, Gatidis S, Eigentler TK, Amaral T, Moltz JH, Othman AE. Can Whole-Body Baseline CT Radiomics Add Information to the Prediction of Best Response, Progression-Free Survival, and Overall Survival of Stage IV Melanoma Patients Receiving First-Line Targeted Therapy: A Retrospective Register Study. Diagnostics (Basel) 2023; 13:3210. [PMID: 37892030 PMCID: PMC10605712 DOI: 10.3390/diagnostics13203210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 10/06/2023] [Accepted: 10/12/2023] [Indexed: 10/29/2023] Open
Abstract
BACKGROUND The aim of this study was to investigate whether the combination of radiomics and clinical parameters in a machine-learning model offers additive information compared with the use of only clinical parameters in predicting the best response, progression-free survival after six months, as well as overall survival after six and twelve months in patients with stage IV malignant melanoma undergoing first-line targeted therapy. METHODS A baseline machine-learning model using clinical variables (demographic parameters and tumor markers) was compared with an extended model using clinical variables and radiomic features of the whole tumor burden, utilizing repeated five-fold cross-validation. Baseline CTs of 91 stage IV malignant melanoma patients, all treated in the same university hospital, were identified in the Central Malignant Melanoma Registry and all metastases were volumetrically segmented (n = 4727). RESULTS Compared with the baseline model, the extended radiomics model did not add significantly more information to the best-response prediction (AUC [95% CI] 0.548 (0.188, 0.808) vs. 0.487 (0.139, 0.743)), the prediction of PFS after six months (AUC [95% CI] 0.699 (0.436, 0.958) vs. 0.604 (0.373, 0.867)), or the overall survival prediction after six and twelve months (AUC [95% CI] 0.685 (0.188, 0.967) vs. 0.766 (0.433, 1.000) and AUC [95% CI] 0.554 (0.163, 0.781) vs. 0.616 (0.271, 1.000), respectively). CONCLUSIONS The results showed no additional value of baseline whole-body CT radiomics for best-response prediction, progression-free survival prediction for six months, or six-month and twelve-month overall survival prediction for stage IV melanoma patients receiving first-line targeted therapy. These results need to be validated in a larger cohort.
Collapse
Affiliation(s)
- Felix Peisen
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Eberhard Karls University, Hoppe-Seyler-Straße 3, 72076 Tuebingen, Germany; (I.D.); (K.N.); (S.G.); (A.E.O.)
| | - Annika Gerken
- Fraunhofer MEVIS, Max-von-Laue-Straße 2, 28359 Bremen, Germany; (A.G.); (A.H.); (J.H.M.)
| | - Alessa Hering
- Fraunhofer MEVIS, Max-von-Laue-Straße 2, 28359 Bremen, Germany; (A.G.); (A.H.); (J.H.M.)
- Diagnostic Image Analysis Group, Radboud University Medical Center (Radboudumc), Geert Grooteplein Zuid 10, 6525 GA Nijmegen, The Netherlands
| | - Isabel Dahm
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Eberhard Karls University, Hoppe-Seyler-Straße 3, 72076 Tuebingen, Germany; (I.D.); (K.N.); (S.G.); (A.E.O.)
| | - Konstantin Nikolaou
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Eberhard Karls University, Hoppe-Seyler-Straße 3, 72076 Tuebingen, Germany; (I.D.); (K.N.); (S.G.); (A.E.O.)
- Image-Guided and Functionally Instructed Tumor Therapies (iFIT), The Cluster of Excellence (EXC 2180), 72076 Tuebingen, Germany
| | - Sergios Gatidis
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Eberhard Karls University, Hoppe-Seyler-Straße 3, 72076 Tuebingen, Germany; (I.D.); (K.N.); (S.G.); (A.E.O.)
- Max Planck Institute for Intelligent Systems, Max-Planck-Ring 4, 72076 Tuebingen, Germany
| | - Thomas K. Eigentler
- Center of Dermato-Oncology, Department of Dermatology, Tuebingen University Hospital, Eberhard Karls University, Liebermeisterstraße 25, 72076 Tuebingen, Germany; (T.K.E.); (T.A.)
- Department of Dermatology, Venereology and Allergology, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humbolt-Universität zu Berlin, Luisenstraße 2, 10117 Berlin, Germany
| | - Teresa Amaral
- Center of Dermato-Oncology, Department of Dermatology, Tuebingen University Hospital, Eberhard Karls University, Liebermeisterstraße 25, 72076 Tuebingen, Germany; (T.K.E.); (T.A.)
| | - Jan H. Moltz
- Fraunhofer MEVIS, Max-von-Laue-Straße 2, 28359 Bremen, Germany; (A.G.); (A.H.); (J.H.M.)
| | - Ahmed E. Othman
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Eberhard Karls University, Hoppe-Seyler-Straße 3, 72076 Tuebingen, Germany; (I.D.); (K.N.); (S.G.); (A.E.O.)
- Institute of Neuroradiology, Johannes Gutenberg University Hospital Mainz, Langenbeckstraße 1, 55131 Mainz, Germany
| |
Collapse
|
14
|
Papoutsoglou G, Tarazona S, Lopes MB, Klammsteiner T, Ibrahimi E, Eckenberger J, Novielli P, Tonda A, Simeon A, Shigdel R, Béreux S, Vitali G, Tangaro S, Lahti L, Temko A, Claesson MJ, Berland M. Machine learning approaches in microbiome research: challenges and best practices. Front Microbiol 2023; 14:1261889. [PMID: 37808286 PMCID: PMC10556866 DOI: 10.3389/fmicb.2023.1261889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 09/04/2023] [Indexed: 10/10/2023] Open
Abstract
Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.
Collapse
Affiliation(s)
- Georgios Papoutsoglou
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, Heraklion, Greece
| | - Sonia Tarazona
- Department of Applied Statistics and Operations Research and Quality, Polytechnic University of Valencia, Valencia, Spain
| | - Marta B. Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Research and Development Unit for Mechanical and Industrial Engineering (UNIDEMI), Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| | - Thomas Klammsteiner
- Department of Ecology, Universität Innsbruck, Innsbruck, Austria
- Department of Microbiology, Universität Innsbruck, Innsbruck, Austria
| | - Eliana Ibrahimi
- Department of Biology, University of Tirana, Tirana, Albania
| | - Julia Eckenberger
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Pierfrancesco Novielli
- Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- National Institute for Nuclear Physics, Bari Division, Bari, Italy
| | - Alberto Tonda
- UMR 518 MIA-PS, INRAE, Paris-Saclay University, Palaiseau, France
- Complex Systems Institute of Paris Ile-de-France (ISC-PIF) - UAR 3611 CNRS, Paris, France
| | - Andrea Simeon
- BioSense Institute, University of Novi Sad, Novi Sad, Serbia
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Stéphane Béreux
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
- MaIAGE, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Giacomo Vitali
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Sabina Tangaro
- Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- National Institute for Nuclear Physics, Bari Division, Bari, Italy
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Andriy Temko
- Department of Electrical and Electronic Engineering, University College Cork, Cork, Ireland
| | - Marcus J. Claesson
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Magali Berland
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| |
Collapse
|
15
|
Lakiotaki K, Papadovasilakis Z, Lagani V, Fafalios S, Charonyktakis P, Tsagris M, Tsamardinos I. Automated machine learning for genome wide association studies. Bioinformatics 2023; 39:btad545. [PMID: 37672022 PMCID: PMC10562960 DOI: 10.1093/bioinformatics/btad545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 06/29/2023] [Accepted: 09/05/2023] [Indexed: 09/07/2023] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) present several computational and statistical challenges for their data analysis, including knowledge discovery, interpretability, and translation to clinical practice. RESULTS We develop, apply, and comparatively evaluate an automated machine learning (AutoML) approach, customized for genomic data that delivers reliable predictive and diagnostic models, the set of genetic variants that are important for predictions (called a biosignature), and an estimate of the out-of-sample predictive power. This AutoML approach discovers variants with higher predictive performance compared to standard GWAS methods, computes an individual risk prediction score, generalizes to new, unseen data, is shown to better differentiate causal variants from other highly correlated variants, and enhances knowledge discovery and interpretability by reporting multiple equivalent biosignatures. AVAILABILITY AND IMPLEMENTATION Code for this study is available at: https://github.com/mensxmachina/autoML-GWAS. JADBio offers a free version at: https://jadbio.com/sign-up/. SNP data can be downloaded from the EGA repository (https://ega-archive.org/). PRS data are found at: https://www.aicrowd.com/challenges/opensnp-height-prediction. Simulation data to study population structure can be found at: https://easygwas.ethz.ch/data/public/dataset/view/1/.
Collapse
Affiliation(s)
| | - Zaharias Papadovasilakis
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, GR-70013 Heraklion, Greece
- Laboratory of Immune Regulation and Tolerance, School of Medicine, University of Crete, Heraklion, Greece
| | - Vincenzo Lagani
- Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology KAUST, Thuwal 23952, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, Thuwal 23952, Saudi Arabia
- Institute of Chemical Biology, Ilia State University, Tbilisi, Georgia
| | - Stefanos Fafalios
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, GR-70013 Heraklion, Greece
| | - Paulos Charonyktakis
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, GR-70013 Heraklion, Greece
| | - Michail Tsagris
- Department of Computer Science, University of Crete, Heraklion, Greece
- Department of Economics, University of Crete, Heraklion, Greece
| | - Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, GR-70013 Heraklion, Greece
| |
Collapse
|
16
|
Bhattacharyay S, Caruso PF, Åkerlund C, Wilson L, Stevens RD, Menon DK, Steyerberg EW, Nelson DW, Ercole A. Mining the contribution of intensive care clinical course to outcome after traumatic brain injury. NPJ Digit Med 2023; 6:154. [PMID: 37604980 PMCID: PMC10442346 DOI: 10.1038/s41746-023-00895-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 08/01/2023] [Indexed: 08/23/2023] Open
Abstract
Existing methods to characterise the evolving condition of traumatic brain injury (TBI) patients in the intensive care unit (ICU) do not capture the context necessary for individualising treatment. Here, we integrate all heterogenous data stored in medical records (1166 pre-ICU and ICU variables) to model the individualised contribution of clinical course to 6-month functional outcome on the Glasgow Outcome Scale -Extended (GOSE). On a prospective cohort (n = 1550, 65 centres) of TBI patients, we train recurrent neural network models to map a token-embedded time series representation of all variables (including missing values) to an ordinal GOSE prognosis every 2 h. The full range of variables explains up to 52% (95% CI: 50-54%) of the ordinal variance in functional outcome. Up to 91% (95% CI: 90-91%) of this explanation is derived from pre-ICU and admission information (i.e., static variables). Information collected in the ICU (i.e., dynamic variables) increases explanation (by up to 5% [95% CI: 4-6%]), though not enough to counter poorer overall performance in longer-stay (>5.75 days) patients. Highest-contributing variables include physician-based prognoses, CT features, and markers of neurological function. Whilst static information currently accounts for the majority of functional outcome explanation after TBI, data-driven analysis highlights investigative avenues to improve the dynamic characterisation of longer-stay patients. Moreover, our modelling strategy proves useful for converting large patient records into interpretable time series with missing data integration and minimal processing.
Collapse
Affiliation(s)
- Shubhayu Bhattacharyay
- Division of Anaesthesia, University of Cambridge, Cambridge, UK.
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK.
- Laboratory of Computational Intensive Care Medicine, Johns Hopkins University, Baltimore, MD, USA.
| | - Pier Francesco Caruso
- Division of Anaesthesia, University of Cambridge, Cambridge, UK
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan, 20072, Italy
| | - Cecilia Åkerlund
- Department of Physiology and Pharmacology, Section for Perioperative Medicine and Intensive Care, Karolinska Institutet, Stockholm, Sweden
| | - Lindsay Wilson
- Division of Psychology, University of Stirling, Stirling, UK
| | - Robert D Stevens
- Laboratory of Computational Intensive Care Medicine, Johns Hopkins University, Baltimore, MD, USA
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - David K Menon
- Division of Anaesthesia, University of Cambridge, Cambridge, UK
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - David W Nelson
- Department of Physiology and Pharmacology, Section for Perioperative Medicine and Intensive Care, Karolinska Institutet, Stockholm, Sweden
| | - Ari Ercole
- Division of Anaesthesia, University of Cambridge, Cambridge, UK
- Cambridge Centre for Artificial Intelligence in Medicine, Cambridge, UK
| |
Collapse
|
17
|
Kuipers M, Kappen M, Naber M. How nervous am I? How computer vision succeeds and humans fail in interpreting state anxiety from dynamic facial behaviour. Cogn Emot 2023; 37:1105-1115. [PMID: 37395739 DOI: 10.1080/02699931.2023.2229545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 04/17/2023] [Accepted: 06/16/2023] [Indexed: 07/04/2023]
Abstract
For human interaction, it is important to understand what emotional state others are in. Especially the observation of faces aids us in putting behaviours into context and gives insight into emotions and mental states of others. Detecting whether someone is nervous, a form of state anxiety, is such an example as it reveals a person's familiarity and contentment with the circumstances. With recent developments in computer vision we developed behavioural nervousness models to show which time-varying facial cues reveal whether someone is nervous in an interview setting. The facial changes, reflecting a state of anxiety, led to more visual exposure and less chemosensory (taste and olfaction) exposure. However, experienced observers had difficulty picking up these changes and failed to detect nervousness levels accurately therewith. This study highlights humans' limited capacity in determining complex emotional states but at the same time provides an automated model that can assist us in achieving fair assessments of so far unexplored emotional states.
Collapse
Affiliation(s)
- Mithras Kuipers
- Experimental Psychology, Helmholtz Institute, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands
| | - Mitchel Kappen
- Experimental Psychology, Helmholtz Institute, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands
- Department of Head and Skin, Ghent University, University Hospital Ghent (UZ Ghent), Ghent, Belgium
| | - Marnix Naber
- Experimental Psychology, Helmholtz Institute, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
18
|
Boutin L, Morisson L, Riché F, Barthélémy R, Mebazaa A, Soyer P, Gallix B, Dohan A, Chousterman BG. Radiomic analysis of abdominal organs during sepsis of digestive origin in a French intensive care unit. Acute Crit Care 2023; 38:343-352. [PMID: 37652864 PMCID: PMC10497895 DOI: 10.4266/acc.2023.00136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 06/12/2023] [Accepted: 06/15/2023] [Indexed: 09/02/2023] Open
Abstract
BACKGROUND Sepsis is a severe and common cause of admission to the intensive care unit (ICU). Radiomic analysis (RA) may predict organ failure and patient outcomes. The objective of this study was to assess a model of RA and to evaluate its performance in predicting in-ICU mortality and acute kidney injury (AKI) during abdominal sepsis. METHODS This single-center, retrospective study included patients admitted to the ICU for abdominal sepsis. To predict in-ICU mortality or AKI, elastic net regularized logistic regression and the random forest algorithm were used in a five-fold cross-validation set repeated 10 times. RESULTS Fifty-five patients were included. In-ICU mortality was 25.5%, and 76.4% of patients developed AKI. To predict in-ICU mortality, elastic net and random forest models, respectively, achieved areas under the curve (AUCs) of 0.48 (95% confidence interval [CI], 0.43-0.54) and 0.51 (95% CI, 0.46-0.57) and were not improved combined with Simplified Acute Physiology Score (SAPS) II. To predict AKI with RA, the AUC was 0.71 (95% CI, 0.66-0.77) for elastic net and 0.69 (95% CI, 0.64-0.74) for random forest, and these were improved combined with SAPS II, respectively; AUC of 0.94 (95% CI, 0.91-0.96) and 0.75 (95% CI, 0.70-0.80) for elastic net and random forest, respectively. CONCLUSIONS This study suggests that RA has poor predictive performance for in-ICU mortality but good predictive performance for AKI in patients with abdominal sepsis. A secondary validation cohort is needed to confirm these results and the assessed model.
Collapse
Affiliation(s)
- Louis Boutin
- Department of Anesthesiology and Critical Care, Hôpital Lariboisière, AP-HP, Paris, France
- INSERM UMR-S 942, MASCOT, Université Paris Cité, Paris, France
| | - Louis Morisson
- Department of Anesthesiology and Critical Care, Hôpital Lariboisière, AP-HP, Paris, France
| | - Florence Riché
- Department of Anesthesiology and Critical Care, Hôpital Lariboisière, AP-HP, Paris, France
| | - Romain Barthélémy
- Department of Anesthesiology and Critical Care, Hôpital Lariboisière, AP-HP, Paris, France
| | - Alexandre Mebazaa
- Department of Anesthesiology and Critical Care, Hôpital Lariboisière, AP-HP, Paris, France
- INSERM UMR-S 942, MASCOT, Université Paris Cité, Paris, France
| | - Philippe Soyer
- INSERM UMR-S 942, MASCOT, Université Paris Cité, Paris, France
- Department of Radiology, Cochin Hospital, AP-HP, Paris, France
| | - Benoit Gallix
- IHU Strasbourg, Strasbourg, France
- Icube Laboratory and Faculty of Medicine, University of Strasbourg, Strasbourg, France
- Department of Radiology, McGill University, Montreal, QC, Canada
| | - Anthony Dohan
- INSERM UMR-S 942, MASCOT, Université Paris Cité, Paris, France
- Department of Radiology, Cochin Hospital, AP-HP, Paris, France
| | - Benjamin G Chousterman
- Department of Anesthesiology and Critical Care, Hôpital Lariboisière, AP-HP, Paris, France
- INSERM UMR-S 942, MASCOT, Université Paris Cité, Paris, France
| |
Collapse
|
19
|
Laqua FC, Woznicki P, Bley TA, Schöneck M, Rinneburger M, Weisthoff M, Schmidt M, Persigehl T, Iuga AI, Baeßler B. Transfer-Learning Deep Radiomics and Hand-Crafted Radiomics for Classifying Lymph Nodes from Contrast-Enhanced Computed Tomography in Lung Cancer. Cancers (Basel) 2023; 15:2850. [PMID: 37345187 DOI: 10.3390/cancers15102850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/06/2023] [Accepted: 05/19/2023] [Indexed: 06/23/2023] Open
Abstract
OBJECTIVES Positron emission tomography (PET) is currently considered the non-invasive reference standard for lymph node (N-)staging in lung cancer. However, not all patients can undergo this diagnostic procedure due to high costs, limited availability, and additional radiation exposure. The purpose of this study was to predict the PET result from traditional contrast-enhanced computed tomography (CT) and to test different feature extraction strategies. METHODS In this study, 100 lung cancer patients underwent a contrast-enhanced 18F-fluorodeoxyglucose (FDG) PET/CT scan between August 2012 and December 2019. We trained machine learning models to predict FDG uptake in the subsequent PET scan. Model inputs were composed of (i) traditional "hand-crafted" radiomics features from the segmented lymph nodes, (ii) deep features derived from a pretrained EfficientNet-CNN, and (iii) a hybrid approach combining (i) and (ii). RESULTS In total, 2734 lymph nodes [555 (20.3%) PET-positive] from 100 patients [49% female; mean age 65, SD: 14] with lung cancer (60% adenocarcinoma, 21% plate epithelial carcinoma, 8% small-cell lung cancer) were included in this study. The area under the receiver operating characteristic curve (AUC) ranged from 0.79 to 0.87, and the scaled Brier score (SBS) ranged from 16 to 36%. The random forest model (iii) yielded the best results [AUC 0.871 (0.865-0.878), SBS 35.8 (34.2-37.2)] and had significantly higher model performance than both approaches alone (AUC: p < 0.001, z = 8.8 and z = 22.4; SBS: p < 0.001, z = 11.4 and z = 26.6, against (i) and (ii), respectively). CONCLUSION Both traditional radiomics features and transfer-learning deep radiomics features provide relevant and complementary information for non-invasive N-staging in lung cancer.
Collapse
Affiliation(s)
- Fabian Christopher Laqua
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, University of Würzburg, 97080 Würzburg, Germany
| | - Piotr Woznicki
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, University of Würzburg, 97080 Würzburg, Germany
| | - Thorsten A Bley
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, University of Würzburg, 97080 Würzburg, Germany
| | - Mirjam Schöneck
- Institute of Diagnostic and Interventional Radiology, Medical Faculty and University Hospital Cologne, University of Cologne, 50937 Cologne, Germany
| | - Miriam Rinneburger
- Institute of Diagnostic and Interventional Radiology, Medical Faculty and University Hospital Cologne, University of Cologne, 50937 Cologne, Germany
| | - Mathilda Weisthoff
- Institute of Diagnostic and Interventional Radiology, Medical Faculty and University Hospital Cologne, University of Cologne, 50937 Cologne, Germany
| | - Matthias Schmidt
- Department of Nuclear Medicine, Medical Faculty and University Hospital Cologne, University of Cologne, 50937 Cologne, Germany
| | - Thorsten Persigehl
- Institute of Diagnostic and Interventional Radiology, Medical Faculty and University Hospital Cologne, University of Cologne, 50937 Cologne, Germany
| | - Andra-Iza Iuga
- Institute of Diagnostic and Interventional Radiology, Medical Faculty and University Hospital Cologne, University of Cologne, 50937 Cologne, Germany
| | - Bettina Baeßler
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, University of Würzburg, 97080 Würzburg, Germany
| |
Collapse
|
20
|
Pellegrini M. Accurate prognosis for localized prostate cancer through coherent voting networks with multi-omic and clinical data. Sci Rep 2023; 13:7875. [PMID: 37188913 DOI: 10.1038/s41598-023-35023-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 05/11/2023] [Indexed: 05/17/2023] Open
Abstract
Localized prostate cancer is a very heterogeneous disease, from both a clinical and a biological/biochemical point of view, which makes the task of producing stratifications of patients into risk classes remarkably challenging. In particular, it is important an early detection and discrimination of the indolent forms of the disease, from the aggressive ones, requiring post-surgery closer surveillance and timely treatment decisions. This work extends a recently developed supervised machine learning (ML) technique, called coherent voting networks (CVN) by incorporating a novel model-selection technique to counter the danger of model overfitting. For the challenging problem of discriminating between indolent and aggressive types of localized prostate cancer, accurate prognostic prediction of post-surgery progression-free survival with a granularity within a year is attained, improving accuracy with respect to the current state of the art. The development of novel ML techniques tailored to the problem of combining multi-omics and clinical prognostic biomarkers is a promising new line of attack for sharpening the capability to diversify and personalize cancer patient treatments. The proposed approach allows a finer post-surgery stratification of patients within the clinical high-risk category, with a potential impact on the surveillance regime and the timing of treatment decisions, complementing existing prognostic methods.
Collapse
Affiliation(s)
- Marco Pellegrini
- Institute of Informatics and Telematics (IIT), CNR, 56124, Pisa, Italy.
| |
Collapse
|
21
|
Lewis MJ, Spiliopoulou A, Goldmann K, Pitzalis C, McKeigue P, Barnes MR. nestedcv: an R package for fast implementation of nested cross-validation with embedded feature selection designed for transcriptomics and high-dimensional data. BIOINFORMATICS ADVANCES 2023; 3:vbad048. [PMID: 37113250 PMCID: PMC10125905 DOI: 10.1093/bioadv/vbad048] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 02/21/2023] [Accepted: 04/12/2023] [Indexed: 04/29/2023]
Abstract
Motivation Although machine learning models are commonly used in medical research, many analyses implement a simple partition into training data and hold-out test data, with cross-validation (CV) for tuning of model hyperparameters. Nested CV with embedded feature selection is especially suited to biomedical data where the sample size is frequently limited, but the number of predictors may be significantly larger (P ≫ n). Results The nestedcv R package implements fully nested k × l-fold CV for lasso and elastic-net regularized linear models via the glmnet package and supports a large array of other machine learning models via the caret framework. Inner CV is used to tune models and outer CV is used to determine model performance without bias. Fast filter functions for feature selection are provided and the package ensures that filters are nested within the outer CV loop to avoid information leakage from performance test sets. Measurement of performance by outer CV is also used to implement Bayesian linear and logistic regression models using the horseshoe prior over parameters to encourage a sparse model and determine unbiased model accuracy. Availability and implementation The R package nestedcv is available from CRAN: https://CRAN.R-project.org/package=nestedcv.
Collapse
Affiliation(s)
- Myles J Lewis
- Centre for Experimental Medicine & Rheumatology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
- Alan Turing Institute, London NW1 2AJ, UK
| | - Athina Spiliopoulou
- Usher Institute, College of Medicine and Veterinary Medicine, University of Edinburgh, Edinburgh EH16 4UX, UK
| | - Katriona Goldmann
- Centre for Experimental Medicine & Rheumatology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
- Centre for Translational Bioinformatics, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Costantino Pitzalis
- Centre for Experimental Medicine & Rheumatology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Paul McKeigue
- Usher Institute, College of Medicine and Veterinary Medicine, University of Edinburgh, Edinburgh EH16 4UX, UK
| | - Michael R Barnes
- Alan Turing Institute, London NW1 2AJ, UK
- Centre for Translational Bioinformatics, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| |
Collapse
|
22
|
Robellada‐Zárate CM, Luna‐Palacios JE, Caballero CAZ, Acuña‐González JP, Lara‐Pereyra I, González‐Azpeitia DI, Acuña‐González RJ, Moreno‐Verduzco ER, Flores‐Herrera H, Osorio‐Caballero M. First‐trimester plasma extracellular heat shock proteins levels and risk of preeclampsia. J Cell Mol Med 2023; 27:1206-1213. [PMID: 37002651 PMCID: PMC10148059 DOI: 10.1111/jcmm.17674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 12/19/2022] [Accepted: 12/28/2022] [Indexed: 04/03/2023] Open
Abstract
Preeclampsia (PE) occurs annually in 8% of pregnancies. Patients without risk factors represent 10% of these. There are currently no first-trimester biochemical markers that accurately predict PE. An increase in serum 60- and 70-KDa extracellular heat shock proteins (eHsp) has been shown in patients who developed PE at 34 weeks. We sought to determine whether there is a relationship between first-trimester eHsp and the development of PE. This was a prospective cohort study performed at a third level hospital in Mexico City from 2019 to 2020. eHsp levels were measured during the first-trimester ultrasound in singleton pregnancies with no comorbidities. First-trimester eHsp levels and biochemical parameters of organ dysfunction were compared between patients who developed preeclampsia and those who did not. All statistical analyses and model of correlation (r) between eHsp and clinical parameter were performed using bootstrapping R-software. p-values <0.05 were considered significant. The final analysis included 41 patients. PE occurred in 11 cases. eHsp-60 and eHsp-70 were significantly higher at 12 weeks in patients who developed PE (p = 0.001), while eHsp-27 was significantly lower (p = 0.004). Significant differences in first-trimester eHsp concentration suggest that these are possible early biomarkers useful for the prediction of PE.
Collapse
Affiliation(s)
- Claudia Melina Robellada‐Zárate
- Departamento de Ginecología y Obstetricia Instituto Nacional de Perinatología “Isidro Espinosa de los Reyes” Ciudad de México Mexico
| | | | - Carlos Agustín Zapata Caballero
- Departamento de Ginecología y Obstetricia Instituto Nacional de Perinatología “Isidro Espinosa de los Reyes” Ciudad de México Mexico
| | - Juan Pablo Acuña‐González
- Departamento de Matemáticas, Facultad de Ciencias Universidad Nacional Autónoma de México Ciudad de México Mexico
| | - Irlando Lara‐Pereyra
- Departamento de Ginecología, Hospital General de Zona 252 Instituto Mexicano del Seguro Social Atlacomulco Mexico
| | | | - Ricardo Josué Acuña‐González
- Departamento de Inmunobioquimica Instituto Nacional de Perinatología “Isidro Espinosa de los Reyes” Ciudad de México Mexico
| | - Elsa Romelia Moreno‐Verduzco
- Subdirección de Servicios Auxiliares de Diagnóstico Instituto Nacional de Perinatología “Isidro Espinosa de los Reyes” Ciudad de México Mexico
| | - Héctor Flores‐Herrera
- Departamento de Inmunobioquimica Instituto Nacional de Perinatología “Isidro Espinosa de los Reyes” Ciudad de México Mexico
| | - Mauricio Osorio‐Caballero
- Departamento de Salud Sexual y Reproductiva Instituto Nacional de Perinatología “Isidro Espinosa de los Reyes” Ciudad de México Mexico
| |
Collapse
|
23
|
Litwińczuk MC, Muhlert N, Trujillo-Barreto N, Woollams A. Using graph theory as a common language to combine neural structure and function in models of healthy cognitive performance. Hum Brain Mapp 2023; 44:3007-3022. [PMID: 36880608 PMCID: PMC10171528 DOI: 10.1002/hbm.26258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/05/2022] [Accepted: 02/18/2023] [Indexed: 03/08/2023] Open
Abstract
Graph theory has been used in cognitive neuroscience to understand how organisational properties of structural and functional brain networks relate to cognitive function. Graph theory may bridge the gap in integration of structural and functional connectivity by introducing common measures of network characteristics. However, the explanatory and predictive value of combined structural and functional graph theory have not been investigated in modelling of cognitive performance of healthy adults. In this work, a Principal Component Regression approach with embedded Step-Wise Regression was used to fit multiple regression models of Executive Function, Self-regulation, Language, Encoding and Sequence Processing with a collection of 20 different graph theoretic measures of structural and functional network organisation used as regressors. The predictive ability of graph theory-based models was compared to that of connectivity-based models. The present work shows that using combinations of graph theory metrics to predict cognition in healthy populations does not produce a consistent benefit relative to making predictions based on structural and functional connectivity values directly.
Collapse
Affiliation(s)
- Marta Czime Litwińczuk
- Division of Neuroscience and Experimental Psychology, University of Manchester, Manchester, UK
| | - Nils Muhlert
- Division of Neuroscience and Experimental Psychology, University of Manchester, Manchester, UK
| | - Nelson Trujillo-Barreto
- Division of Neuroscience and Experimental Psychology, University of Manchester, Manchester, UK
| | - Anna Woollams
- Division of Neuroscience and Experimental Psychology, University of Manchester, Manchester, UK
| |
Collapse
|
24
|
Tsamardinos I. Don't lose samples to estimation. PATTERNS (NEW YORK, N.Y.) 2022; 3:100612. [PMID: 36569551 PMCID: PMC9782254 DOI: 10.1016/j.patter.2022.100612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In a typical predictive modeling task, we are asked to produce a final predictive model to employ operationally for predictions, as well as an estimate of its out-of-sample predictive performance. Typically, analysts hold out a portion of the available data, called a Test set, to estimate the model predictive performance on unseen (out-of-sample) records, thus "losing these samples to estimation." However, this practice is unacceptable when the total sample size is low. To avoid losing data to estimation, we need a shift in our perspective: we do not estimate the performance of a specific model instance; we estimate the performance of the pipeline that produces the model. This pipeline is applied on all available samples to produce the final model; no samples are lost to estimation. An estimate of its performance is provided by training the same pipeline on subsets of the samples. When multiple pipelines are tried, additional considerations that correct for the "winner's curse" need to be in place.
Collapse
Affiliation(s)
- Ioannis Tsamardinos
- Computer Science Department, University of Crete, Heraklion, Greece,JADBio – Gnosis DA S.A, Heraklion, Greece,Institute of Applied and Computational Mathematics, Foundation for Research and Technology, Hellas, Heraklion, Greece,Corresponding author
| |
Collapse
|
25
|
Penaluna BE, Burnett JD, Christiansen K, Arismendi I, Johnson SL, Griswold K, Holycross B, Kolstoe SH. UPRLIMET: UPstream Regional LiDAR Model for Extent of Trout in stream networks. Sci Rep 2022; 12:20266. [PMID: 36456610 PMCID: PMC9715699 DOI: 10.1038/s41598-022-23754-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 11/04/2022] [Indexed: 12/05/2022] Open
Abstract
Predicting the edges of species distributions is fundamental for species conservation, ecosystem services, and management decisions. In North America, the location of the upstream limit of fish in forested streams receives special attention, because fish-bearing portions of streams have more protections during forest management activities than fishless portions. We present a novel model development and evaluation framework, wherein we compare 26 models to predict upper distribution limits of trout in streams. The models used machine learning, logistic regression, and a sophisticated nested spatial cross-validation routine to evaluate predictive performance while accounting for spatial autocorrelation. The model resulting in the best predictive performance, termed UPstream Regional LiDAR Model for Extent of Trout (UPRLIMET), is a two-stage model that uses a logistic regression algorithm calibrated to observations of Coastal Cutthroat Trout (Oncorhynchus clarkii clarkii) occurrence and variables representing hydro-topographic characteristics of the landscape. We predict trout presence along reaches throughout a stream network, and include a stopping rule to identify a discrete upper limit point above which all stream reaches are classified as fishless. Although there is no simple explanation for the upper distribution limit identified in UPRLIMET, four factors, including upstream channel length above the point of uppermost fish, drainage area, slope, and elevation, had highest importance. Across our study region of western Oregon, we found that more of the fish-bearing network is on private lands than on state, US Bureau of Land Mangement (BLM), or USDA Forest Service (USFS) lands, highlighting the importance of using spatially consistent maps across a region and working across land ownerships. Our research underscores the value of using occurrence data to develop simple, but powerful, prediction tools to capture complex ecological processes that contribute to distribution limits of species.
Collapse
Affiliation(s)
- Brooke E Penaluna
- U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station, 3200 SW Jefferson Way, Corvallis, OR, 97331, USA.
| | - Jonathan D Burnett
- U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station, 3200 SW Jefferson Way, Corvallis, OR, 97331, USA
| | - Kelly Christiansen
- U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station, 3200 SW Jefferson Way, Corvallis, OR, 97331, USA
| | - Ivan Arismendi
- Department of Fisheries, Wildlife, and Conservation Sciences, Oregon State University, 104 Nash Hall, Corvallis, OR, 97331, USA
| | - Sherri L Johnson
- U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station, 3200 SW Jefferson Way, Corvallis, OR, 97331, USA
| | - Kitty Griswold
- Department of Biological Sciences, Idaho State University, 921 S. 8th Ave Mail, Stop 8007, Pocatello, ID, 83209-8007, USA
| | - Brett Holycross
- Pacific States Marine Fisheries Commission, 205 SE Spokane St., Portland, OR, 97202, USA
| | - Sonja H Kolstoe
- U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station, 1220 SW 3rd Avenue, Suite 1410, Portland, OR, 97204, USA
| |
Collapse
|
26
|
Yaeger JP, Jones J, Ertefaie A, Caserta MT, Fiscella KA. Derivation of a clinical-based model to detect invasive bacterial infections in febrile infants. J Hosp Med 2022; 17:893-900. [PMID: 36036211 PMCID: PMC9633417 DOI: 10.1002/jhm.12956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/28/2022] [Accepted: 08/15/2022] [Indexed: 11/10/2022]
Abstract
BACKGROUND Febrile infants are at risk for invasive bacterial infections (IBIs) (i.e., bacteremia and bacterial meningitis), which, when undiagnosed, may have devastating consequences. Current IBI predictive models rely on serum biomarkers, which may not provide timely results and may be difficult to obtain in low-resource settings. OBJECTIVE The aim of this study was to derive a clinical-based IBI predictive model for febrile infants. DESIGNS, SETTING, AND PARTICIPANTS This is a cross-sectional study of infants brought to two pediatric emergency departments from January 2011 to December 2018. Inclusion criteria were age 0-90 days, temperature ≥38°C, and documented gestational age, fever duration, and illness duration. MAIN OUTCOME AND MEASURES To detect IBIs, we used regression and ensemble machine learning models and evidence-based predictors (i.e., sex, age, chronic medical condition, gestational age, appearance, maximum temperature, fever duration, illness duration, cough status, and urinary tract inflammation). We up-weighted infants with IBIs 8-fold and used 10-fold cross-validation to avoid overfitting. We calculated the area under the receiver operating characteristic curve (AUC), prioritizing a high sensitivity to identify the optimal cut-point to estimate sensitivity and specificity. RESULTS Of 2311 febrile infants, 39 had an IBI (1.7%); the median age was 54 days (interquartile range: 35-71). The AUC was 0.819 (95% confidence interval: 0.762, 0.868). The predictive model achieved a sensitivity of 0.974 (0.800, 1.00) and a specificity of 0.530 (0.484, 0.575). Findings suggest that a clinical-based model can detect IBIs in febrile infants, performing similarly to serum biomarker-based models. This model may improve health equity by enabling clinicians to estimate IBI risk in any setting. Future studies should prospectively validate findings across multiple sites and investigate performance by age.
Collapse
Affiliation(s)
- Jeffrey P Yaeger
- Department of Pediatrics, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, New York, USA
| | - Jeremiah Jones
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York, USA
| | - Ashkan Ertefaie
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York, USA
| | - Mary T Caserta
- Department of Pediatrics, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
| | - Kevin A Fiscella
- Department of Family Medicine, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
| |
Collapse
|
27
|
Bowler S, Papoutsoglou G, Karanikas A, Tsamardinos I, Corley MJ, Ndhlovu LC. A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity. Sci Rep 2022; 12:17480. [PMID: 36261477 PMCID: PMC9580434 DOI: 10.1038/s41598-022-22201-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 10/11/2022] [Indexed: 01/12/2023] Open
Abstract
Since the onset of the COVID-19 pandemic, increasing cases with variable outcomes continue globally because of variants and despite vaccines and therapies. There is a need to identify at-risk individuals early that would benefit from timely medical interventions. DNA methylation provides an opportunity to identify an epigenetic signature of individuals at increased risk. We utilized machine learning to identify DNA methylation signatures of COVID-19 disease from data available through NCBI Gene Expression Omnibus. A training cohort of 460 individuals (164 COVID-19-infected and 296 non-infected) and an external validation dataset of 128 individuals (102 COVID-19-infected and 26 non-COVID-associated pneumonia) were reanalyzed. Data was processed using ChAMP and beta values were logit transformed. The JADBio AutoML platform was leveraged to identify a methylation signature associated with severe COVID-19 disease. We identified a random forest classification model from 4 unique methylation sites with the power to discern individuals with severe COVID-19 disease. The average area under the curve of receiver operator characteristic (AUC-ROC) of the model was 0.933 and the average area under the precision-recall curve (AUC-PRC) was 0.965. When applied to our external validation, this model produced an AUC-ROC of 0.898 and an AUC-PRC of 0.864. These results further our understanding of the utility of DNA methylation in COVID-19 disease pathology and serve as a platform to inform future COVID-19 related studies.
Collapse
Affiliation(s)
- Scott Bowler
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, 413 E 69th St, New York, NY, 10021, USA
| | - Georgios Papoutsoglou
- JADBio - Gnosis DA S.A, Science and Technology Park of Crete, 70013, Heraklion, Greece
| | - Aristides Karanikas
- JADBio - Gnosis DA S.A, Science and Technology Park of Crete, 70013, Heraklion, Greece
| | - Ioannis Tsamardinos
- JADBio - Gnosis DA S.A, Science and Technology Park of Crete, 70013, Heraklion, Greece
- Department of Computer Science, University of Crete, 70013, Heraklion, Greece
| | - Michael J Corley
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, 413 E 69th St, New York, NY, 10021, USA
| | - Lishomwa C Ndhlovu
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, 413 E 69th St, New York, NY, 10021, USA.
| |
Collapse
|
28
|
Marmolejo-Ramos F, Ospina R, García-Ceja E, Correa JC. Ingredients for Responsible Machine Learning: A Commented Review of The Hitchhiker’s Guide to Responsible Machine Learning. JOURNAL OF STATISTICAL THEORY AND APPLICATIONS 2022; 21:175-185. [PMID: 36160758 PMCID: PMC9483296 DOI: 10.1007/s44199-022-00048-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/02/2022] [Indexed: 11/25/2022] Open
Abstract
AbstractIn The hitchhiker’s guide to responsible machine learning, Biecek, Kozak, and Zawada (here BKZ) provide an illustrated and engaging step-by-step guide on how to perform a machine learning (ML) analysis such that the algorithms, the software, and the entire process is interpretable and transparent for both the data scientist and the end user. This review summarises BKZ’s book and elaborates on three elements key to ML analyses: inductive inference, causality, and interpretability.
Collapse
Affiliation(s)
- Fernando Marmolejo-Ramos
- Centre for Change and Complexity in Learning, University of South Australia, Adelaide, SA 5001 Australia
| | - Raydonal Ospina
- CASTLab, Department of Statistics, Universidade Federal de Pernambuco, Recife, Pernambuco 51280-000 Brazil
| | - Enrique García-Ceja
- Escuela de Ingeniería y Ciencias, Tecnológico de Monterrey, 64849 Monterrey, Nuevo León Mexico
| | - Juan C. Correa
- CESA Business School, Bogotá, Bogotá, DC, 110231 Colombia
| |
Collapse
|
29
|
Chen RJ, Lu MY, Williamson DFK, Chen TY, Lipkova J, Noor Z, Shaban M, Shady M, Williams M, Joo B, Mahmood F. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 2022; 40:865-878.e6. [PMID: 35944502 PMCID: PMC10397370 DOI: 10.1016/j.ccell.2022.07.004] [Citation(s) in RCA: 82] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 10/08/2021] [Accepted: 07/11/2022] [Indexed: 02/07/2023]
Abstract
The rapidly emerging field of computational pathology has demonstrated promise in developing objective prognostic models from histology images. However, most prognostic models are either based on histology or genomics alone and do not address how these data sources can be integrated to develop joint image-omic prognostic models. Additionally, identifying explainable morphological and molecular descriptors from these models that govern such prognosis is of interest. We use multimodal deep learning to jointly examine pathology whole-slide images and molecular profile data from 14 cancer types. Our weakly supervised, multimodal deep-learning algorithm is able to fuse these heterogeneous modalities to predict outcomes and discover prognostic features that correlate with poor and favorable outcomes. We present all analyses for morphological and molecular correlates of patient prognosis across the 14 cancer types at both a disease and a patient level in an interactive open-access database to allow for further exploration, biomarker discovery, and feature assessment.
Collapse
Affiliation(s)
- Richard J Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Pathology, Mass General Hospital, Harvard Medical School, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Cancer Data Science Program, Dana-Farber/Harvard Cancer Institute, Boston, MA, USA
| | - Ming Y Lu
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Pathology, Mass General Hospital, Harvard Medical School, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Cancer Data Science Program, Dana-Farber/Harvard Cancer Institute, Boston, MA, USA; Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
| | - Drew F K Williamson
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Pathology, Mass General Hospital, Harvard Medical School, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Cancer Data Science Program, Dana-Farber/Harvard Cancer Institute, Boston, MA, USA
| | - Tiffany Y Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Cancer Data Science Program, Dana-Farber/Harvard Cancer Institute, Boston, MA, USA
| | - Jana Lipkova
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Cancer Data Science Program, Dana-Farber/Harvard Cancer Institute, Boston, MA, USA
| | - Zahra Noor
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Muhammad Shaban
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Pathology, Mass General Hospital, Harvard Medical School, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Cancer Data Science Program, Dana-Farber/Harvard Cancer Institute, Boston, MA, USA
| | - Maha Shady
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Cancer Data Science Program, Dana-Farber/Harvard Cancer Institute, Boston, MA, USA
| | - Mane Williams
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Pathology, Mass General Hospital, Harvard Medical School, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Cancer Data Science Program, Dana-Farber/Harvard Cancer Institute, Boston, MA, USA
| | - Bumjin Joo
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Pathology, Mass General Hospital, Harvard Medical School, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Cancer Data Science Program, Dana-Farber/Harvard Cancer Institute, Boston, MA, USA; Harvard Data Sciences Initiative, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
30
|
Litwińczuk MC, Trujillo-Barreto N, Muhlert N, Cloutman L, Woollams A. Combination of structural and functional connectivity explains unique variation in specific domains of cognitive function. Neuroimage 2022; 262:119531. [PMID: 35931312 DOI: 10.1016/j.neuroimage.2022.119531] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 07/20/2022] [Accepted: 08/01/2022] [Indexed: 11/29/2022] Open
Abstract
The relationship between structural and functional brain networks has been characterised as complex: the two networks mirror each other and show mutual influence but they also diverge in their organisation. This work explored whether a combination of structural and functional connectivity can improve the fit of regression models of cognitive performance. Principal Component Analysis (PCA) was first applied to cognitive data from the Human Connectome Project to identify latent cognitive components: Executive Function, Self-regulation, Language, Encoding and Sequence Processing. A Principal Component Regression approach with embedded Step-Wise Regression (SWR-PCR) was then used to fit regression models of each cognitive domain based on structural (SC), functional (FC) or combined structural-functional (CC) connectivity. Executive Function was best explained by the CC model. Self-regulation was equally well explained by SC and FC. Language was equally well explained by CC and FC models. Encoding and Sequence Processing were best explained by SC. Evaluation of out-of-sample models' skill via cross-validation showed that SC, FC and CC produced generalisable models of Language performance. SC models performed most effectively at predicting Language performance in unseen sample. Executive Function was most effectively predicted by SC models, followed only by CC models. Self-regulation was only effectively predicted by CC models and Sequence Processing was only effectively predicted by FC models. The present study demonstrates that integrating structural and functional connectivity can help explaining cognitive performance, but that the added explanatory value (in sample) may be domain-specific and can come at the expense of reduced generalisation performance (out-of-sample).
Collapse
Affiliation(s)
| | | | - Nils Muhlert
- Division of Neuroscience and Experimental Psychology, University of Manchester, UK
| | - Lauren Cloutman
- Division of Neuroscience and Experimental Psychology, University of Manchester, UK
| | - Anna Woollams
- Division of Neuroscience and Experimental Psychology, University of Manchester, UK
| |
Collapse
|
31
|
Bhattacharyay S, Milosevic I, Wilson L, Menon DK, Stevens RD, Steyerberg EW, Nelson DW, Ercole A. The leap to ordinal: Detailed functional prognosis after traumatic brain injury with a flexible modelling approach. PLoS One 2022; 17:e0270973. [PMID: 35788768 PMCID: PMC9255749 DOI: 10.1371/journal.pone.0270973] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 06/21/2022] [Indexed: 11/30/2022] Open
Abstract
When a patient is admitted to the intensive care unit (ICU) after a traumatic brain injury (TBI), an early prognosis is essential for baseline risk adjustment and shared decision making. TBI outcomes are commonly categorised by the Glasgow Outcome Scale–Extended (GOSE) into eight, ordered levels of functional recovery at 6 months after injury. Existing ICU prognostic models predict binary outcomes at a certain threshold of GOSE (e.g., prediction of survival [GOSE > 1]). We aimed to develop ordinal prediction models that concurrently predict probabilities of each GOSE score. From a prospective cohort (n = 1,550, 65 centres) in the ICU stratum of the Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI) patient dataset, we extracted all clinical information within 24 hours of ICU admission (1,151 predictors) and 6-month GOSE scores. We analysed the effect of two design elements on ordinal model performance: (1) the baseline predictor set, ranging from a concise set of ten validated predictors to a token-embedded representation of all possible predictors, and (2) the modelling strategy, from ordinal logistic regression to multinomial deep learning. With repeated k-fold cross-validation, we found that expanding the baseline predictor set significantly improved ordinal prediction performance while increasing analytical complexity did not. Half of these gains could be achieved with the addition of eight high-impact predictors to the concise set. At best, ordinal models achieved 0.76 (95% CI: 0.74–0.77) ordinal discrimination ability (ordinal c-index) and 57% (95% CI: 54%– 60%) explanation of ordinal variation in 6-month GOSE (Somers’ Dxy). Model performance and the effect of expanding the predictor set decreased at higher GOSE thresholds, indicating the difficulty of predicting better functional outcomes shortly after ICU admission. Our results motivate the search for informative predictors that improve confidence in prognosis of higher GOSE and the development of ordinal dynamic prediction models.
Collapse
Affiliation(s)
- Shubhayu Bhattacharyay
- Division of Anaesthesia, University of Cambridge, Cambridge, United Kingdom
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, United Kingdom
- Laboratory of Computational Intensive Care Medicine, Johns Hopkins University, Baltimore, MD, United States of America
- * E-mail:
| | - Ioan Milosevic
- Division of Anaesthesia, University of Cambridge, Cambridge, United Kingdom
| | - Lindsay Wilson
- Division of Psychology, University of Stirling, Stirling, United Kingdom
| | - David K. Menon
- Division of Anaesthesia, University of Cambridge, Cambridge, United Kingdom
| | - Robert D. Stevens
- Laboratory of Computational Intensive Care Medicine, Johns Hopkins University, Baltimore, MD, United States of America
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University, Baltimore, MD, United States of America
| | - Ewout W. Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - David W. Nelson
- Department of Physiology and Pharmacology, Section for Perioperative Medicine and Intensive Care, Karolinska Institutet, Stockholm, Sweden
| | - Ari Ercole
- Division of Anaesthesia, University of Cambridge, Cambridge, United Kingdom
- Cambridge Centre for Artificial Intelligence in Medicine, Cambridge, United Kingdom
| | | |
Collapse
|
32
|
Danilatou V, Nikolakakis S, Antonakaki D, Tzagkarakis C, Mavroidis D, Kostoulas T, Ioannidis S. Outcome Prediction in Critically-Ill Patients with Venous Thromboembolism and/or Cancer Using Machine Learning Algorithms: External Validation and Comparison with Scoring Systems. Int J Mol Sci 2022; 23:ijms23137132. [PMID: 35806137 PMCID: PMC9266386 DOI: 10.3390/ijms23137132] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/17/2022] [Accepted: 06/19/2022] [Indexed: 12/16/2022] Open
Abstract
Intensive care unit (ICU) patients with venous thromboembolism (VTE) and/or cancer suffer from high mortality rates. Mortality prediction in the ICU has been a major medical challenge for which several scoring systems exist but lack in specificity. This study focuses on two target groups, namely patients with thrombosis or cancer. The main goal is to develop and validate interpretable machine learning (ML) models to predict early and late mortality, while exploiting all available data stored in the medical record. To this end, retrospective data from two freely accessible databases, MIMIC-III and eICU, were used. Well-established ML algorithms were implemented utilizing automated and purposely built ML frameworks for addressing class imbalance. Prediction of early mortality showed excellent performance in both disease categories, in terms of the area under the receiver operating characteristic curve (AUC–ROC): VTE-MIMIC-III 0.93, eICU 0.87, cancer-MIMIC-III 0.94. On the other hand, late mortality prediction showed lower performance, i.e., AUC–ROC: VTE 0.82, cancer 0.74–0.88. The predictive model of early mortality developed from 1651 VTE patients (MIMIC-III) ended up with a signature of 35 features and was externally validated in 2659 patients from the eICU dataset. Our model outperformed traditional scoring systems in predicting early as well as late mortality. Novel biomarkers, such as red cell distribution width, were identified.
Collapse
Affiliation(s)
- Vasiliki Danilatou
- Sphynx Technology Solutions, 6300 Zug, Switzerland
- School of Medicine, European University of Cyprus, 2404 Nicosia, Cyprus
- Correspondence: or
| | - Stylianos Nikolakakis
- School of Electrical and Computer Engineering, Technical University of Crete, 73100 Chania, Greece; (S.N.); (S.I.)
| | - Despoina Antonakaki
- Institute of Computer Science (ICS)-Foundation for Research and Technology-Hellas (FORTH), 70013 Heraklion, Greece; (D.A.); (C.T.); (D.M.)
| | - Christos Tzagkarakis
- Institute of Computer Science (ICS)-Foundation for Research and Technology-Hellas (FORTH), 70013 Heraklion, Greece; (D.A.); (C.T.); (D.M.)
| | - Dimitrios Mavroidis
- Institute of Computer Science (ICS)-Foundation for Research and Technology-Hellas (FORTH), 70013 Heraklion, Greece; (D.A.); (C.T.); (D.M.)
| | - Theodoros Kostoulas
- Department of Information and Communication Systems Engineering, School of Engineering, University of the Aegean, 83200 Samos, Greece;
| | - Sotirios Ioannidis
- School of Electrical and Computer Engineering, Technical University of Crete, 73100 Chania, Greece; (S.N.); (S.I.)
- Institute of Computer Science (ICS)-Foundation for Research and Technology-Hellas (FORTH), 70013 Heraklion, Greece; (D.A.); (C.T.); (D.M.)
| |
Collapse
|
33
|
Combination of Whole-Body Baseline CT Radiomics and Clinical Parameters to Predict Response and Survival in a Stage-IV Melanoma Cohort Undergoing Immunotherapy. Cancers (Basel) 2022; 14:cancers14122992. [PMID: 35740659 PMCID: PMC9221470 DOI: 10.3390/cancers14122992] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Revised: 06/13/2022] [Accepted: 06/15/2022] [Indexed: 11/17/2022] Open
Abstract
Simple Summary The use of immunotherapeutic agents significantly improved stage-IV melanoma patients’ overall progression-free survival. To identify patients who do not benefit from immunotherapy, both clinical parameters and experimental biomarkers such as radiomics are currently being evaluated. However, no radiomic biomarker is widely accepted for routine clinical use. In a large cohort of 262 stage-IV melanoma patients given first-line immunotherapy treatment, we investigated whether radiomics—based on the segmentation of all baseline metastases in the whole body—in combination with clinical parameters offered added value compared to the usage of clinical parameters alone in a machine-learning prediction model. The primary endpoints were response at three months, and survival rates at six and twelve months. The study indicated a potential, but non-significant, added value of radiomics for six-month and twelve-month survival prediction, thus underlining the relevance of clinical parameters. Abstract Background: This study investigated whether a machine-learning-based combination of radiomics and clinical parameters was superior to the use of clinical parameters alone in predicting therapy response after three months, and overall survival after six and twelve months, in stage-IV malignant melanoma patients undergoing immunotherapy with PD-1 checkpoint inhibitors and CTLA-4 checkpoint inhibitors. Methods: A random forest model using clinical parameters (demographic variables and tumor markers = baseline model) was compared to a random forest model using clinical parameters and radiomics (extended model) via repeated 5-fold cross-validation. For this purpose, the baseline computed tomographies of 262 stage-IV malignant melanoma patients treated at a tertiary referral center were identified in the Central Malignant Melanoma Registry, and all visible metastases were three-dimensionally segmented (n = 6404). Results: The extended model was not significantly superior compared to the baseline model for survival prediction after six and twelve months (AUC (95% CI): 0.664 (0.598, 0.729) vs. 0.620 (0.545, 0.692) and AUC (95% CI): 0.600 (0.526, 0.667) vs. 0.588 (0.481, 0.629), respectively). The extended model was not significantly superior compared to the baseline model for response prediction after three months (AUC (95% CI): 0.641 (0.581, 0.700) vs. 0.656 (0.587, 0.719)). Conclusions: The study indicated a potential, but non-significant, added value of radiomics for six-month and twelve-month survival prediction of stage-IV melanoma patients undergoing immunotherapy.
Collapse
|
34
|
Just Add Data: automated predictive modeling for knowledge discovery and feature selection. NPJ Precis Oncol 2022; 6:38. [PMID: 35710826 PMCID: PMC9203777 DOI: 10.1038/s41698-022-00274-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 04/13/2022] [Indexed: 01/20/2023] Open
Abstract
Fully automated machine learning (AutoML) for predictive modeling is becoming a reality, giving rise to a whole new field. We present the basic ideas and principles of Just Add Data Bio (JADBio), an AutoML platform applicable to the low-sample, high-dimensional omics data that arise in translational medicine and bioinformatics applications. In addition to predictive and diagnostic models ready for clinical use, JADBio focuses on knowledge discovery by performing feature selection and identifying the corresponding biosignatures, i.e., minimal-size subsets of biomarkers that are jointly predictive of the outcome or phenotype of interest. It also returns a palette of useful information for interpretation, clinical use of the models, and decision making. JADBio is qualitatively and quantitatively compared against Hyper-Parameter Optimization Machine Learning libraries. Results show that in typical omics dataset analysis, JADBio manages to identify signatures comprising of just a handful of features while maintaining competitive predictive performance and accurate out-of-sample performance estimation.
Collapse
|
35
|
Dyba K, Wąsala R, Piekarczyk J, Gabała E, Gawlak M, Jasiewicz J, Ratajkiewicz H. Reflectance spectroscopy and machine learning as a tool for the categorization of twin species based on the example of the Diachrysia genus. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 273:121058. [PMID: 35220048 DOI: 10.1016/j.saa.2022.121058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 02/11/2022] [Accepted: 02/14/2022] [Indexed: 06/14/2023]
Abstract
In our work we used noninvasive point reflectance spectroscopy in the range from 400 to 2100 nm coupled with machine learning to study scales on the brown and golden iridescent areas on the dorsal side of the forewing of Diachrysia chrysitis and D. stenochrysis. We used our approach to distinguish between these species of moths. The basis for the study was a statistically significant collection of 95 specimens identified based on morphological feature and gathered during 23 years in Poland. The numerical part of an experiment included two independent discriminant analyses: stochastic and deterministic. The more sensitive stochastic approach achieved average compliance with the species identification made by entomologists at the level of 99-100%. It demonstrated high stability against the different configurations of training and validation sets, hence strong predictors of Diachrysia siblings distinctiveness. Both methods resulted in the same small set of relevant features, where minimal fully discriminating subsets of wavelengths were three for glass scales on the golden area and four for the brown. The differences between species in scales primarily concern their major components and ultrastructure. In melanin-absent glass scales, this is mainly chitin configuration, while in melanin-present brown scales, melanin reveals as an additional factor.
Collapse
Affiliation(s)
- Krzysztof Dyba
- Institute of Geoecology and Geoinformation, Adam Mickiewicz University in Poznań, Poland
| | - Roman Wąsala
- Department of Entomology and Environment Protection, Poznań University of Life Sciences, Poland
| | - Jan Piekarczyk
- Institute of Physical Geography and Environmental Planning, Adam Mickiewicz University in Poznań, Poland
| | - Elżbieta Gabała
- Research Centre of Quarantine, Invasive and Genetically Modified Organisms, Institute of Plant Protection - National Research Institute, Poland
| | - Magdalena Gawlak
- Research Centre of Quarantine, Invasive and Genetically Modified Organisms, Institute of Plant Protection - National Research Institute, Poland
| | - Jarosław Jasiewicz
- Institute of Geoecology and Geoinformation, Adam Mickiewicz University in Poznań, Poland.
| | - Henryk Ratajkiewicz
- Department of Entomology and Environment Protection, Poznań University of Life Sciences, Poland.
| |
Collapse
|
36
|
Karagiannaki I, Gourlia K, Lagani V, Pantazis Y, Tsamardinos I. Learning biologically-interpretable latent representations for gene expression data: Pathway Activity Score Learning Algorithm. Mach Learn 2022; 112:4257-4287. [PMID: 37900054 PMCID: PMC10600308 DOI: 10.1007/s10994-022-06158-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 11/12/2021] [Accepted: 02/19/2022] [Indexed: 11/24/2022]
Abstract
Molecular gene-expression datasets consist of samples with tens of thousands of measured quantities (i.e., high dimensional data). However, lower-dimensional representations that retain the useful biological information do exist. We present a novel algorithm for such dimensionality reduction called Pathway Activity Score Learning (PASL). The major novelty of PASL is that the constructed features directly correspond to known molecular pathways (genesets in general) and can be interpreted as pathway activity scores. Hence, unlike PCA and similar methods, PASL's latent space has a fairly straightforward biological interpretation. PASL is shown to outperform in predictive performance the state-of-the-art method (PLIER) on two collections of breast cancer and leukemia gene expression datasets. PASL is also trained on a large corpus of 50000 gene expression samples to construct a universal dictionary of features across different tissues and pathologies. The dictionary validated on 35643 held-out samples for reconstruction error. It is then applied on 165 held-out datasets spanning a diverse range of diseases. The AutoML tool JADBio is employed to show that the predictive information in the PASL-created feature space is retained after the transformation. The code is available at https://github.com/mensxmachina/PASL.
Collapse
Affiliation(s)
- Ioulia Karagiannaki
- Institute of Electronic Structure and Laser, Foundation for Research and Technology-Hellas (IESL-FORTH), Heraklion, Greece
| | | | - Vincenzo Lagani
- Institute of Chemical Biology, Ilia State University, Tbilisi, 0162 Georgia
- JADBio, Gnosis Data Analysis PC, Heraklion, Crete Greece
| | - Yannis Pantazis
- Institute of Applied and Computational Mathematics, Foundation for Research and Technology - Hellas, Heraklion, Greece
| | - Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio, Gnosis Data Analysis PC, Heraklion, Crete Greece
- Institute of Applied and Computational Mathematics, Foundation for Research and Technology - Hellas, Heraklion, Greece
| |
Collapse
|
37
|
Tissue-Specific Methylation Biosignatures for Monitoring Diseases: An In Silico Approach. Int J Mol Sci 2022; 23:ijms23062959. [PMID: 35328380 PMCID: PMC8952417 DOI: 10.3390/ijms23062959] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 02/06/2023] Open
Abstract
Tissue-specific gene methylation events are key to the pathogenesis of several diseases and can be utilized for diagnosis and monitoring. Here, we established an in silico pipeline to analyze high-throughput methylome datasets to identify specific methylation fingerprints in three pathological entities of major burden, i.e., breast cancer (BrCa), osteoarthritis (OA) and diabetes mellitus (DM). Differential methylation analysis was conducted to compare tissues/cells related to the pathology and different types of healthy tissues, revealing Differentially Methylated Genes (DMGs). Highly performing and low feature number biosignatures were built with automated machine learning, including: (1) a five-gene biosignature discriminating BrCa tissue from healthy tissues (AUC 0.987 and precision 0.987), (2) three equivalent OA cartilage-specific biosignatures containing four genes each (AUC 0.978 and precision 0.986) and (3) a four-gene pancreatic β-cell-specific biosignature (AUC 0.984 and precision 0.995). Next, the BrCa biosignature was validated using an independent ccfDNA dataset showing an AUC and precision of 1.000, verifying the biosignature’s applicability in liquid biopsy. Functional and protein interaction prediction analysis revealed that most DMGs identified are involved in pathways known to be related to the studied diseases or pointed to new ones. Overall, our data-driven approach contributes to the maximum exploitation of high-throughput methylome readings, helping to establish specific disease profiles to be applied in clinical practice and to understand human pathology.
Collapse
|
38
|
Tsagris M, Papadovasilakis Z, Lakiotaki K, Tsamardinos I. The γ-OMP Algorithm for Feature Selection With Application to Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1214-1224. [PMID: 33035156 DOI: 10.1109/tcbb.2020.3029952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of features. In this paper, we propose γ-OMP, a generalisation of the highly-scalable Orthogonal Matching Pursuit feature selection algorithm. γ-OMP can handle (a)various types of outcomes, such as continuous, binary, nominal, time-to-event, (b)discrete (categorical)features, (c)different statistical-based stopping criteria, (d)several predictive models (e.g., linear or logistic regression), (e)various types of residuals, and (f)different types of association. We compare γ-OMP against LASSO, a prototypical, widely used algorithm for high-dimensional data. On both simulated data and several real gene expression datasets, γ-OMP is on par, or outperforms LASSO in binary classification (case-control data), regression (quantified outcomes), and time-to-event data (censored survival times). γ-OMP is based on simple statistical ideas, it is easy to implement and to extend, and our extensive evaluation shows that it is also effective in bioinformatics analysis settings.
Collapse
|
39
|
Reported Adverse Effects and Attitudes among Arab Populations Following COVID-19 Vaccination: A Large-Scale Multinational Study Implementing Machine Learning Tools in Predicting Post-Vaccination Adverse Effects Based on Predisposing Factors. Vaccines (Basel) 2022; 10:vaccines10030366. [PMID: 35334998 PMCID: PMC8955470 DOI: 10.3390/vaccines10030366] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 02/23/2022] [Accepted: 02/24/2022] [Indexed: 02/04/2023] Open
Abstract
Background: The unprecedented global spread of coronavirus disease 2019 (COVID-19) has imposed huge challenges on the healthcare facilities, and impacted every aspect of life. This has led to the development of several vaccines against COVID-19 within one year. This study aimed to assess the attitudes and the side effects among Arab communities after receiving a COVID-19 vaccine and use of machine learning (ML) tools to predict post-vaccination side effects based on predisposing factors. Methods: An online-based multinational survey was carried out via social media platforms from 14 June to 31 August 2021, targeting individuals who received at least one dose of a COVID-19 vaccine from 22 Arab countries. Descriptive statistics, correlation, and chi-square tests were used to analyze the data. Moreover, extensive ML tools were utilized to predict 30 post vaccination adverse effects and their severity based on 15 predisposing factors. The importance of distinct predisposing factors in predicting particular side effects was determined using global feature importance employing gradient boost as AutoML. Results: A total of 10,064 participants from 19 Arab countries were included in this study. Around 56% were female and 59% were aged from 20 to 39 years old. A high rate of vaccine hesitancy (51%) was reported among participants. Almost 88% of the participants were vaccinated with one of three COVID-19 vaccines, including Pfizer-BioNTech (52.8%), AstraZeneca (20.7%), and Sinopharm (14.2%). About 72% of participants experienced post-vaccination side effects. This study reports statistically significant associations (p < 0.01) between various predisposing factors and post-vaccinations side effects. In terms of predicting post-vaccination side effects, gradient boost, random forest, and XGBoost outperformed other ML methods. The most important predisposing factors for predicting certain side effects (i.e., tiredness, fever, headache, injection site pain and swelling, myalgia, and sleepiness and laziness) were revealed to be the number of doses, gender, type of vaccine, age, and hesitancy to receive a COVID-19 vaccine. Conclusions: The reported side effects following COVID-19 vaccination among Arab populations are usually non-life-threatening; flu-like symptoms and injection site pain. Certain predisposing factors have greater weight and importance as input data in predicting post-vaccination side effects. Based on the most significant input data, ML can also be used to predict these side effects; people with certain predicted side effects may require additional medical attention, or possibly hospitalization.
Collapse
|
40
|
Liquid Biopsy in Type 2 Diabetes Mellitus Management: Building Specific Biosignatures via Machine Learning. J Clin Med 2022; 11:jcm11041045. [PMID: 35207316 PMCID: PMC8876363 DOI: 10.3390/jcm11041045] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 02/09/2022] [Accepted: 02/15/2022] [Indexed: 02/05/2023] Open
Abstract
Background: The need for minimally invasive biomarkers for the early diagnosis of type 2 diabetes (T2DM) prior to the clinical onset and monitoring of β-pancreatic cell loss is emerging. Here, we focused on studying circulating cell-free DNA (ccfDNA) as a liquid biopsy biomaterial for accurate diagnosis/monitoring of T2DM. Methods: ccfDNA levels were directly quantified in sera from 96 T2DM patients and 71 healthy individuals via fluorometry, and then fragment DNA size profiling was performed by capillary electrophoresis. Following this, ccfDNA methylation levels of five β-cell-related genes were measured via qPCR. Data were analyzed by automated machine learning to build classifying predictive models. Results: ccfDNA levels were found to be similar between groups but indicative of apoptosis in T2DM. INS (Insulin), IAPP (Islet Amyloid Polypeptide-Amylin), GCK (Glucokinase), and KCNJ11 (Potassium Inwardly Rectifying Channel Subfamily J member 11) levels differed significantly between groups. AutoML analysis delivered biosignatures including GCK, IAPP and KCNJ11 methylation, with the highest ever reported discriminating performance of T2DM from healthy individuals (AUC 0.927). Conclusions: Our data unravel the value of ccfDNA as a minimally invasive biomaterial carrying important clinical information for T2DM. Upon prospective clinical evaluation, the built biosignature can be disruptive for T2DM clinical management.
Collapse
|
41
|
Fanourgakis GS, Gkagkas K, Froudakis G. Introducing artificial MOFs for improved machine learning predictions: Identification of top-performing materials for methane storage. J Chem Phys 2022; 156:054103. [DOI: 10.1063/5.0075994] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- George S. Fanourgakis
- Department of Chemistry, University of Crete, Voutes Campus, GR-70013 Heraklion, Crete, Greece
| | - Konstantinos Gkagkas
- Material Engineering Division, Toyota Motor Europe NV/SA, Technical Center, Hoge Wei 33B, 1930 Zaventem, Belgium
| | - George Froudakis
- Department of Chemistry, University of Crete, Voutes Campus, GR-70013 Heraklion, Crete, Greece
| |
Collapse
|
42
|
Fischer A, Hertwig A, Hahn R, Anwar M, Siebenrock T, Pesta M, Liebau K, Timmermann I, Brugger J, Posch M, Ringl H, Tamandl D, Hiesmayr M, Roth D, Zielinski C, Jäger U, Staudinger T, Schellongowski P, Lang I, Gottsauner-Wolf M, Mascherbauer J, Heinz G, Oberbauer R, Trauner M, Ferlitsch A, Zauner C, Wolf Husslein P, Krepler P, Shariat S, Gnant M, Sahora K, Laufer G, Taghavi S, Huk I, Radtke C, Markstaller K, Rössler B, Schaden E, Bacher A, Faybik P, Ullrich R, Plöchl W, Ihra G, Schäfer B, Mouhieddine M, Neugebauer T, Mares P, Steinlechner B, Schiferer A, Tschernko E. Validation of bedside ultrasound to predict lumbar muscle area in the computed tomography in 200 non-critically ill patients: The USVALID prospective study. Clin Nutr 2022; 41:829-837. [PMID: 35263692 DOI: 10.1016/j.clnu.2022.01.034] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 01/19/2022] [Accepted: 01/31/2022] [Indexed: 12/25/2022]
Abstract
BACKGROUND & AIMS Skeletal muscle area (SMA) in the computed tomography (CT) at the third lumbar vertebra (L3) level is a proxy for whole-body muscle mass but is only performed for clinical reasons. Ultrasound is a promising tool to determine muscle mass at the bedside. It is still unclear how well ultrasound and which ultrasound measuring points can predict CT L3 SMA. METHODS This prospective observational trial included 200 non-critically ill patients, who underwent an abdominal CT scan for any clinical reason within 48 h before the ultrasound examination. Ultrasound muscle thickness was evaluated at 3 measuring points on the thigh and 2 measuring points on the upper arm with minimal compression. On the CT scan, the entire L3 SMA was measured based on Hounsfield units. Using a model selection algorithm based on the Bayesian information criterion (BIC) and clinical considerations, a linear prediction model for CT L3 SMA based on the ultrasound muscle thickness and other independent variables was fitted and assessed with cross-validation. RESULTS 67,5% and 32,5% of the patients were from surgical and medical wards, respectively. Mean ultrasound muscle thickness values were between 2,2 and 3,6 cm on the thigh and between 1,4 and 2,8 cm on the upper arm. All ultrasound muscle thickness values were higher in men than in women (P < 0,05). CT L3 SMA was 40 cm2 higher in men than in women (P < 0,001). The final prediction model for CT L3 SMA included the following 4 independent variables: ultrasound muscle thickness at the ventral measuring point of the thigh in the short-axis plane, sex, weight, and height. It had a similar BIC (BIC of 1515) compared to larger models with 6-8 independent variables including multiple ultrasound measuring points (BIC of 1506-1519). Additional clinical considerations to choose the final model were less time consumption when measuring a single ultrasound measuring point and better anatomical overview at the short-axis plane. The final model predicted CT L3 SMA with a R2 of 0,74 (P < 0,001) and a cross-validated R2 of 0,65. CONCLUSIONS One single ultrasound measuring point at the thigh together with sex, height and weight very well predicts CT L3 SMA across different clinical populations. Ultrasound is a safe and bedside method to measure muscle thickness longitudinally to monitor the effects of nutrition and physical therapy.
Collapse
Affiliation(s)
- Arabella Fischer
- Division of Cardiothoracic and Vascular Anaesthesia and Intensive Care Medicine, Medical University of Vienna, Austria
| | - Anatol Hertwig
- Division of Cardiothoracic and Vascular Anaesthesia and Intensive Care Medicine, Medical University of Vienna, Austria
| | - Ricarda Hahn
- Division of Cardiothoracic and Vascular Anaesthesia and Intensive Care Medicine, Medical University of Vienna, Austria
| | - Martin Anwar
- Division of Cardiothoracic and Vascular Anaesthesia and Intensive Care Medicine, Medical University of Vienna, Austria
| | - Timo Siebenrock
- Division of Cardiothoracic and Vascular Anaesthesia and Intensive Care Medicine, Medical University of Vienna, Austria
| | - Maximilian Pesta
- Division of Cardiothoracic and Vascular Anaesthesia and Intensive Care Medicine, Medical University of Vienna, Austria
| | - Konstantin Liebau
- Division of Cardiothoracic and Vascular Anaesthesia and Intensive Care Medicine, Medical University of Vienna, Austria
| | - Isabel Timmermann
- Division of Cardiothoracic and Vascular Anaesthesia and Intensive Care Medicine, Medical University of Vienna, Austria
| | - Jonas Brugger
- Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Austria
| | - Martin Posch
- Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Austria
| | - Helmut Ringl
- Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Austria
| | - Dietmar Tamandl
- Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Austria
| | - Michael Hiesmayr
- Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Austria.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Radiomics Features of the Spleen as Surrogates for CT-Based Lymphoma Diagnosis and Subtype Differentiation. Cancers (Basel) 2022; 14:cancers14030713. [PMID: 35158980 PMCID: PMC8833623 DOI: 10.3390/cancers14030713] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 01/26/2022] [Accepted: 01/27/2022] [Indexed: 02/05/2023] Open
Abstract
Simple Summary In malignant lymphoma an early and accurate diagnosis is essential for therapy initiation and patient outcome. Within the diagnostic process, imaging plays a crucial role in disease staging. However, an invasive biopsy is required for subtype classification. Involvement of the spleen, a major lymphoid organ, is frequent in malignant lymphoma; this may be reactive or due to infiltration by malignant cells. Using radiomics features of the spleen in a machine learning approach, we investigated the possibility of distinguishing malignant lymphoma patients from other cancer patients and to classify lymphoma subtypes in the case of disease presence. Recent studies have proven the value of radiomics analysis in differentiating lymphoma from non-lymphoma groups on involved sites. Supported by machine learning, imaging could gain importance as a noninvasive diagnostic tool for future lymphoma classification, offering more precise radiological information for an interdisciplinary approach regarding treatment planning. Abstract The spleen is often involved in malignant lymphoma, which manifests on CT as either splenomegaly or focal, hypodense lymphoma lesions. This study aimed to investigate the diagnostic value of radiomics features of the spleen in classifying malignant lymphoma against non-lymphoma as well as the determination of malignant lymphoma subtypes in the case of disease presence—in particular Hodgkin lymphoma (HL), diffuse large B-cell lymphoma (DLBCL), mantle-cell lymphoma (MCL), and follicular lymphoma (FL). Spleen segmentations of 326 patients (139 female, median age 54.1 +/− 18.7 years) were generated and 1317 radiomics features per patient were extracted. For subtype classification, we created four different binary differentiation tasks and addressed them with a Random Forest classifier using 10-fold cross-validation. To detect the most relevant features, permutation importance was analyzed. Classifier results using all features were: malignant lymphoma vs. non-lymphoma AUC = 0.86 (p < 0.01); HL vs. NHL AUC = 0.75 (p < 0.01); DLBCL vs. other NHL AUC = 0.65 (p < 0.01); MCL vs. FL AUC = 0.67 (p < 0.01). Classifying malignant lymphoma vs. non-lymphoma was also possible using only shape features AUC = 0.77 (p < 0.01), with the most important feature being sphericity. Based on only shape features, a significant AUC could be achieved for all tasks, however, best results were achieved combining shape and textural features. This study demonstrates the value of splenic imaging and radiomic analysis in the diagnostic process in malignant lymphoma detection and subtype classification.
Collapse
|
44
|
Iadanza E, Fabbri R, Goretti F, Nardo G, Niccolai E, Bendotti C, Amedei A. Machine learning for analysis of gene expression data in fast- and slow-progressing amyotrophic lateral sclerosis murine models. Biocybern Biomed Eng 2022. [DOI: 10.1016/j.bbe.2022.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
45
|
Mühlbauer J, Kriegmair MC, Schöning L, Egen L, Kowalewski KF, Westhoff N, Nuhn P, Laqua FC, Baessler B. Value of Radiomics of Perinephric Fat for Prediction of Intraoperative Complexity in Renal Tumor Surgery. Urol Int 2021; 106:604-615. [PMID: 34903703 DOI: 10.1159/000520445] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 10/21/2021] [Indexed: 11/19/2022]
Abstract
INTRODUCTION The aim of this study was to assess the value of computed tomography (CT)-based radiomics of perinephric fat (PNF) for prediction of surgical complexity. METHODS Fifty-six patients who underwent renal tumor surgery were included. Radiomic features were extracted from contrast-enhanced CT. Machine learning models using radiomic features, the Mayo Adhesive Probability (MAP) score, and/or clinical variables (age, sex, and body mass index) were compared for the prediction of adherent PNF (APF), the occurrence of postoperative complications (Clavien-Dindo Classification ≥2), and surgery duration. Discrimination performance was assessed by the area under the receiver operating characteristic curve (AUC). In addition, the root mean square error (RMSE) and R2 (fraction of explained variance) were used as additional evaluation metrics. RESULTS A single feature logit model containing "Wavelet-LHH-transformed GLCM Correlation" achieved the best discrimination (AUC 0.90, 95% confidence interval [CI]: 0.75-1.00) and lowest error (RMSE 0.32, 95% CI: 0.20-0.42) at prediction of APF. This model was superior to all other models containing all radiomic features, clinical variables, and/or the MAP score. The performance of uninformative benchmark models for prediction of postoperative complications and surgery duration were not improved by machine learning models. CONCLUSION Radiomic features derived from PNF may provide valuable information for preoperative risk stratification of patients undergoing renal tumor surgery.
Collapse
Affiliation(s)
- Julia Mühlbauer
- Department of Urology and Urological Surgery, University Medical Center Mannheim, University of Heidelberg, Mannheim, Germany
| | - Maximilian C Kriegmair
- Department of Urology and Urological Surgery, University Medical Center Mannheim, University of Heidelberg, Mannheim, Germany
| | - Lale Schöning
- Department of Urology and Urological Surgery, University Medical Center Mannheim, University of Heidelberg, Mannheim, Germany
| | - Luisa Egen
- Department of Urology and Urological Surgery, University Medical Center Mannheim, University of Heidelberg, Mannheim, Germany
| | - Karl-Friedrich Kowalewski
- Department of Urology and Urological Surgery, University Medical Center Mannheim, University of Heidelberg, Mannheim, Germany
| | - Niklas Westhoff
- Department of Urology and Urological Surgery, University Medical Center Mannheim, University of Heidelberg, Mannheim, Germany
| | - Philipp Nuhn
- Department of Urology and Urological Surgery, University Medical Center Mannheim, University of Heidelberg, Mannheim, Germany
| | - Fabian C Laqua
- Institute of Diagnostic and Interventional Radiology, University Hospital Zurich, Zurich, Switzerland
| | - Bettina Baessler
- Institute of Diagnostic and Interventional Radiology, University Hospital Zurich, Zurich, Switzerland.,Institute of Clinical Radiology and Nuclear Medicine, University Medical Center Mannheim, University of Heidelberg, Mannheim, Germany
| |
Collapse
|
46
|
Bhattacharyay S, Rattray J, Wang M, Dziedzic PH, Calvillo E, Kim HB, Joshi E, Kudela P, Etienne-Cummings R, Stevens RD. Decoding accelerometry for classification and prediction of critically ill patients with severe brain injury. Sci Rep 2021; 11:23654. [PMID: 34880296 PMCID: PMC8654973 DOI: 10.1038/s41598-021-02974-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 11/25/2021] [Indexed: 11/23/2022] Open
Abstract
Our goal is to explore quantitative motor features in critically ill patients with severe brain injury (SBI). We hypothesized that computational decoding of these features would yield information on underlying neurological states and outcomes. Using wearable microsensors placed on all extremities, we recorded a median 24.1 (IQR: 22.8-25.1) hours of high-frequency accelerometry data per patient from a prospective cohort (n = 69) admitted to the ICU with SBI. Models were trained using time-, frequency-, and wavelet-domain features and levels of responsiveness and outcome as labels. The two primary tasks were detection of levels of responsiveness, assessed by motor sub-score of the Glasgow Coma Scale (GCSm), and prediction of functional outcome at discharge, measured with the Glasgow Outcome Scale-Extended (GOSE). Detection models achieved significant (AUC: 0.70 [95% CI: 0.53-0.85]) and consistent (observation windows: 12 min-9 h) discrimination of SBI patients capable of purposeful movement (GCSm > 4). Prediction models accurately discriminated patients of upper moderate disability or better (GOSE > 5) with 2-6 h of observation (AUC: 0.82 [95% CI: 0.75-0.90]). Results suggest that time series analysis of motor activity yields clinically relevant insights on underlying functional states and short-term outcomes in patients with SBI.
Collapse
Affiliation(s)
- Shubhayu Bhattacharyay
- Laboratory of Computational Intensive Care Medicine, Johns Hopkins University, Baltimore, MD, USA.
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA.
| | - John Rattray
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Matthew Wang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Peter H Dziedzic
- Laboratory of Computational Intensive Care Medicine, Johns Hopkins University, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University, Baltimore, MD, USA
| | - Eusebia Calvillo
- Department of Neurology, Johns Hopkins University, Baltimore, MD, USA
| | - Han B Kim
- Laboratory of Computational Intensive Care Medicine, Johns Hopkins University, Baltimore, MD, USA
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Eshan Joshi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Pawel Kudela
- Department of Neurosurgery, Johns Hopkins University, Baltimore, MD, USA
| | - Ralph Etienne-Cummings
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Robert D Stevens
- Laboratory of Computational Intensive Care Medicine, Johns Hopkins University, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University, Baltimore, MD, USA
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University, Baltimore, MD, USA
- Department of Neurosurgery, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
47
|
Hatmal MM, Alshaer W, Mahmoud IS, Al-Hatamleh MAI, Al-Ameer HJ, Abuyaman O, Zihlif M, Mohamud R, Darras M, Al Shhab M, Abu-Raideh R, Ismail H, Al-Hamadi A, Abdelhay A. Investigating the association of CD36 gene polymorphisms (rs1761667 and rs1527483) with T2DM and dyslipidemia: Statistical analysis, machine learning based prediction, and meta-analysis. PLoS One 2021; 16:e0257857. [PMID: 34648514 PMCID: PMC8516279 DOI: 10.1371/journal.pone.0257857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/11/2021] [Indexed: 12/15/2022] Open
Abstract
CD36 (cluster of differentiation 36) is a membrane protein involved in lipid metabolism and has been linked to pathological conditions associated with metabolic disorders, such as diabetes and dyslipidemia. A case-control study was conducted and included 177 patients with type-2 diabetes mellitus (T2DM) and 173 control subjects to study the involvement of CD36 gene rs1761667 (G>A) and rs1527483 (C>T) polymorphisms in the pathogenesis of T2DM and dyslipidemia among Jordanian population. Lipid profile, blood sugar, gender and age were measured and recorded. Also, genotyping analysis for both polymorphisms was performed. Following statistical analysis, 10 different neural networks and machine learning (ML) tools were used to predict subjects with diabetes or dyslipidemia. Towards further understanding of the role of CD36 protein and gene in T2DM and dyslipidemia, a protein-protein interaction network and meta-analysis were carried out. For both polymorphisms, the genotypic frequencies were not significantly different between the two groups (p > 0.05). On the other hand, some ML tools like multilayer perceptron gave high prediction accuracy (≥ 0.75) and Cohen's kappa (κ) (≥ 0.5). Interestingly, in K-star tool, the accuracy and Cohen's κ values were enhanced by including the genotyping results as inputs (0.73 and 0.46, respectively, compared to 0.67 and 0.34 without including them). This study confirmed, for the first time, that there is no association between CD36 polymorphisms and T2DM or dyslipidemia among Jordanian population. Prediction of T2DM and dyslipidemia, using these extensive ML tools and based on such input data, is a promising approach for developing diagnostic and prognostic prediction models for a wide spectrum of diseases, especially based on large medical databases.
Collapse
Affiliation(s)
- Ma’mon M. Hatmal
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
- * E-mail:
| | - Walhan Alshaer
- Cell Therapy Centre, The University of Jordan, Amman, Jordan
| | - Ismail S. Mahmoud
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Mohammad A. I. Al-Hatamleh
- Department of Immunology, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia
| | - Hamzeh J. Al-Ameer
- Department of Biology and Biotechnology, American University of Madaba, Madaba, Jordan
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Omar Abuyaman
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Malek Zihlif
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Rohimah Mohamud
- Department of Immunology, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia
| | - Mais Darras
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Mohammad Al Shhab
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Rand Abu-Raideh
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Hilweh Ismail
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Ali Al-Hamadi
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Ali Abdelhay
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| |
Collapse
|
48
|
Yoon HG, Oh D, Noh JM, Cho WK, Sun JM, Kim HK, Zo JI, Shim YM, Kim K. Machine learning model for predicting excessive muscle loss during neoadjuvant chemoradiotherapy in oesophageal cancer. J Cachexia Sarcopenia Muscle 2021; 12:1144-1152. [PMID: 34145771 PMCID: PMC8517349 DOI: 10.1002/jcsm.12747] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 05/12/2021] [Accepted: 06/08/2021] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Excessive skeletal muscle loss during neoadjuvant concurrent chemoradiotherapy (NACRT) is significantly related to survival outcomes of oesophageal cancer. However, the conventional method for measuring skeletal muscle mass requires computed tomography (CT) images, and the calculation process is labour-intensive. In this study, we built machine-learning models to predict excessive skeletal muscle loss, using only body mass index data and blood laboratory test results. METHODS We randomly split the data of 232 male patients treated with NACRT for oesophageal cancer into the training (70%) and test (30%) sets for 1000 iterations. The naive random over sampling method was applied to each training set to adjust for class imbalance, and we used seven different machine-learning algorithms to predict excessive skeletal muscle loss. We used five input variables, namely, relative change percentage in body mass index, albumin, prognostic nutritional index, neutrophil-to-lymphocyte ratio, and platelet-to-lymphocyte ratio over 50 days. According to our previous study results, which used the maximal χ2 method, 10.0% decrease of skeletal muscle index over 50 days was determined as the cut-off value to define the excessive skeletal muscle loss. RESULTS The five input variables were significantly different between the excessive and the non-excessive muscle loss group (all P < 0.001). None of the clinicopathologic variables differed significantly between the two groups. The ensemble model of logistic regression and support vector classifier showed the highest area under the curve value among all the other models [area under the curve = 0.808, 95% confidence interval (CI): 0.708-0.894]. The sensitivity and specificity of the ensemble model were 73.7% (95% CI: 52.6%-89.5%) and 74.5% (95% CI: 62.7%-86.3%), respectively. CONCLUSIONS Machine learning model using the ensemble of logistic regression and support vector classifier most effectively predicted the excessive muscle loss following NACRT in patients with oesophageal cancer. This model can easily screen the patients with excessive muscle loss who need an active intervention or timely care following NACRT.
Collapse
Affiliation(s)
- Han Gyul Yoon
- Department of Radiation Oncology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Dongryul Oh
- Department of Radiation Oncology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jae Myoung Noh
- Department of Radiation Oncology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Won Kyung Cho
- Department of Radiation Oncology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jong-Mu Sun
- Department of Internal Medicine, Division of Hematology-Oncology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Hong Kwan Kim
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jae Ill Zo
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Young Mog Shim
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Kyunga Kim
- Statistics and Data Center, Research Institute for Future Medicine, Samsung Medical Center, Seoul, Republic of Korea
| |
Collapse
|
49
|
Virtual Monoenergetic Images of Dual-Energy CT-Impact on Repeatability, Reproducibility, and Classification in Radiomics. Cancers (Basel) 2021; 13:cancers13184710. [PMID: 34572937 PMCID: PMC8467875 DOI: 10.3390/cancers13184710] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 09/14/2021] [Accepted: 09/17/2021] [Indexed: 01/06/2023] Open
Abstract
Simple Summary Virtual monoenergetic images from dual-energy CT are incrementally used in routine clinical practice. Thus, radiomic analysis will be more often performed on these images in the future. This study characterized the test–retest repeatability and reproducibility of radiomic features from virtual monoenergetic images and their impact on machine-learning-based lesion classification. The results of this study provide a basis to improve radiomic analyses and identify the role of feature stability in classification tasks when using virtual monoenergetic imaging with different scan or reconstruction parameters in multicenter clinical studies. Abstract The purpose of this study was to (i) evaluate the test–retest repeatability and reproducibility of radiomic features in virtual monoenergetic images (VMI) from dual-energy CT (DECT) depending on VMI energy (40, 50, 75, 120, 190 keV), radiation dose (5 and 15 mGy), and DECT approach (dual-source and split-filter DECT) in a phantom (ex vivo), and (ii) to assess the impact of VMI energy and feature repeatability on machine-learning-based classification in vivo in 72 patients with 72 hypodense liver lesions. Feature repeatability and reproducibility were determined by concordance–correlation–coefficient (CCC) and dynamic range (DR) ≥0.9. Test–retest repeatability was high within the same VMI energies and scan conditions (percentage of repeatable features ranging from 74% for SFDE mode at 40 keV and 15 mGy to 86% for DSDE at 190 keV and 15 mGy), while reproducibility varied substantially across different VMI energies and DECTs (percentage of reproducible features ranging from 32.8% for SFDE at 5 mGy comparing 40 with 190 keV to 99.2% for DSDE at 15 mGy comparing 40 with 50 keV). No major differences were observed between the two radiation doses (<10%) in all pair-wise comparisons. In vivo, machine learning classification using penalized regression and random forests resulted in the best discrimination of hemangiomas and metastases at low-energy VMI (40 keV), and for cysts at high-energy VMI (120 keV). Feature selection based on feature repeatability did not improve classification performance. Our results demonstrate the high repeatability of radiomics features when keeping scan and reconstruction conditions constant. Reproducibility diminished when using different VMI energies or DECT approaches. The choice of optimal VMI energy improved lesion classification in vivo and should hence be adapted to the specific task.
Collapse
|
50
|
Zimmer L, Lindauer M, Hutter F. Auto-Pytorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:3079-3090. [PMID: 33750687 DOI: 10.1109/tpami.2021.3067763] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, a recent trend in AutoML is to focus on neural architecture search. In this paper, we introduce Auto-PyTorch, which brings together the best of these two worlds by jointly and robustly optimizing the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL). Auto-PyTorch achieves state-of-the-art performance on several tabular benchmarks by combining multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs) and common baselines for tabular data. To thoroughly study our assumptions on how to design such an AutoDL system, we additionally introduce a new benchmark on learning curves for DNNs, dubbed LCBench, and run extensive ablation studies of the full Auto-PyTorch on typical AutoML benchmarks, eventually showing that Auto-PyTorch performs better than several state-of-the-art competitors.
Collapse
|