1
|
Li C, Luo Y, Xie Y, Zhang Z, Liu Y, Zou L, Xiao F. Structural and functional prediction, evaluation, and validation in the post-sequencing era. Comput Struct Biotechnol J 2024; 23:446-451. [PMID: 38223342 PMCID: PMC10787220 DOI: 10.1016/j.csbj.2023.12.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/16/2024] Open
Abstract
The surge of genome sequencing data has underlined substantial genetic variants of uncertain significance (VUS). The decryption of VUS discovered by sequencing poses a major challenge in the post-sequencing era. Although experimental assays have progressed in classifying VUS, only a tiny fraction of the human genes have been explored experimentally. Thus, it is urgently needed to generate state-of-the-art functional predictors of VUS in silico. Artificial intelligence (AI) is an invaluable tool to assist in the identification of VUS with high efficiency and accuracy. An increasing number of studies indicate that AI has brought an exciting acceleration in the interpretation of VUS, and our group has already used AI to develop protein structure-based prediction models. In this review, we provide an overview of the previous research on AI-based prediction of missense variants, and elucidate the challenges and opportunities for protein structure-based variant prediction in the post-sequencing era.
Collapse
Affiliation(s)
- Chang Li
- Clinical Biobank, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Yixuan Luo
- Beijing Normal University, Beijing, China
| | - Yibo Xie
- Information Center, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Zaifeng Zhang
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Ye Liu
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Lihui Zou
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Fei Xiao
- Clinical Biobank, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- Beijing Normal University, Beijing, China
| |
Collapse
|
2
|
Shi J, Zhang K, Guo C, Yang Y, Xu Y, Wu J. A survey of label-noise deep learning for medical image analysis. Med Image Anal 2024; 95:103166. [PMID: 38613918 DOI: 10.1016/j.media.2024.103166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/15/2024]
Abstract
Several factors are associated with the success of deep learning. One of the most important reasons is the availability of large-scale datasets with clean annotations. However, obtaining datasets with accurate labels in the medical imaging domain is challenging. The reliability and consistency of medical labeling are some of these issues, and low-quality annotations with label noise usually exist. Because noisy labels reduce the generalization performance of deep neural networks, learning with noisy labels is becoming an essential task in medical image analysis. Literature on this topic has expanded in terms of volume and scope. However, no recent surveys have collected and organized this knowledge, impeding the ability of researchers and practitioners to utilize it. In this work, we presented an up-to-date survey of label-noise learning for medical image domain. We reviewed extensive literature, illustrated some typical methods, and showed unified taxonomies in terms of methodological differences. Subsequently, we conducted the methodological comparison and demonstrated the corresponding advantages and disadvantages. Finally, we discussed new research directions based on the characteristics of medical images. Our survey aims to provide researchers and practitioners with a solid understanding of existing medical label-noise learning, such as the main algorithms developed over the past few years, which could help them investigate new methods to combat with the negative effects of label noise.
Collapse
Affiliation(s)
- Jialin Shi
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China.
| | - Kailai Zhang
- Department of Networks, China Mobile Communications Group Co., Ltd., Beijing, China
| | - Chenyi Guo
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| | | | - Yali Xu
- Department of Breast Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Ji Wu
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| |
Collapse
|
3
|
Tasci E, Shah Y, Jagasia S, Zhuge Y, Shephard J, Johnson MO, Elemento O, Joyce T, Chappidi S, Cooley Zgela T, Sproull M, Mackey M, Camphausen K, Krauze AV. MGMT ProFWise: Unlocking a New Application for Combined Feature Selection and the Rank-Based Weighting Method to Link MGMT Methylation Status to Serum Protein Expression in Patients with Glioblastoma. Int J Mol Sci 2024; 25:4082. [PMID: 38612892 PMCID: PMC11012706 DOI: 10.3390/ijms25074082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 04/02/2024] [Accepted: 04/03/2024] [Indexed: 04/14/2024] Open
Abstract
Glioblastoma (GBM) is a fatal brain tumor with limited treatment options. O6-methylguanine-DNA-methyltransferase (MGMT) promoter methylation status is the central molecular biomarker linked to both the response to temozolomide, the standard chemotherapy drug employed for GBM, and to patient survival. However, MGMT status is captured on tumor tissue which, given the difficulty in acquisition, limits the use of this molecular feature for treatment monitoring. MGMT protein expression levels may offer additional insights into the mechanistic understanding of MGMT but, currently, they correlate poorly to promoter methylation. The difficulty of acquiring tumor tissue for MGMT testing drives the need for non-invasive methods to predict MGMT status. Feature selection aims to identify the most informative features to build accurate and interpretable prediction models. This study explores the new application of a combined feature selection (i.e., LASSO and mRMR) and the rank-based weighting method (i.e., MGMT ProFWise) to non-invasively link MGMT promoter methylation status and serum protein expression in patients with GBM. Our method provides promising results, reducing dimensionality (by more than 95%) when employed on two large-scale proteomic datasets (7k SomaScan® panel and CPTAC) for all our analyses. The computational results indicate that the proposed approach provides 14 shared serum biomarkers that may be helpful for diagnostic, prognostic, and/or predictive operations for GBM-related processes, given further validation.
Collapse
Affiliation(s)
- Erdal Tasci
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| | - Yajas Shah
- Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Sarisha Jagasia
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| | - Ying Zhuge
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| | - Jason Shephard
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| | - Margaret O. Johnson
- Department of Neurosurgery, Duke University, Durham, NC 27710, USA
- National Tele-Oncology, Veterans Health Administration, Durham, NC 27710, USA
| | - Olivier Elemento
- Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Thomas Joyce
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| | - Shreya Chappidi
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| | - Theresa Cooley Zgela
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| | - Mary Sproull
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| | - Megan Mackey
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| | - Kevin Camphausen
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| | - Andra Valentina Krauze
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA
| |
Collapse
|
4
|
Wójcik Z, Dimitrova V, Warrington L, Velikova G, Absolom K. Using Machine Learning to Predict Unplanned Hospital Utilization and Chemotherapy Management From Patient-Reported Outcome Measures. JCO Clin Cancer Inform 2024; 8:e2300264. [PMID: 38669610 PMCID: PMC11161248 DOI: 10.1200/cci.23.00264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/14/2024] [Accepted: 03/01/2024] [Indexed: 04/28/2024] Open
Abstract
PURPOSE Adverse effects of chemotherapy often require hospital admissions or treatment management. Identifying factors contributing to unplanned hospital utilization may improve health care quality and patients' well-being. This study aimed to assess if patient-reported outcome measures (PROMs) improve performance of machine learning (ML) models predicting hospital admissions, triage events (contacting helpline or attending hospital), and changes to chemotherapy. MATERIALS AND METHODS Clinical trial data were used and contained responses to three PROMs (European Organisation for Research and Treatment of Cancer Core Quality of Life Questionnaire [QLQ-C30], EuroQol Five-Dimensional Visual Analogue Scale [EQ-5D], and Functional Assessment of Cancer Therapy-General [FACT-G]) and clinical information on 508 participants undergoing chemotherapy. Six feature sets (with following variables: [1] all available; [2] clinical; [3] PROMs; [4] clinical and QLQ-C30; [5] clinical and EQ-5D; [6] clinical and FACT-G) were applied in six ML models (logistic regression [LR], decision tree, adaptive boosting, random forest [RF], support vector machines [SVMs], and neural network) to predict admissions, triage events, and chemotherapy changes. RESULTS The comprehensive analysis of predictive performances of the six ML models for each feature set in three different methods for handling class imbalance indicated that PROMs improved predictions of all outcomes. RF and SVMs had the highest performance for predicting admissions and changes to chemotherapy in balanced data sets, and LR in imbalanced data set. Balancing data led to the best performance compared with imbalanced data set or data set with balanced train set only. CONCLUSION These results endorsed the view that ML can be applied on PROM data to predict hospital utilization and chemotherapy management. If further explored, this study may contribute to health care planning and treatment personalization. Rigorous comparison of model performance affected by different imbalanced data handling methods shows best practice in ML research.
Collapse
Affiliation(s)
- Zuzanna Wójcik
- UKRI Centre for Doctoral Training in Artificial Intelligence for Medical Diagnosis and Care, University of Leeds, Leeds, United Kingdom
| | - Vania Dimitrova
- School of Computing, University of Leeds, Leeds, United Kingdom
| | - Lorraine Warrington
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, United Kingdom
| | - Galina Velikova
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, United Kingdom
- Leeds Cancer Centre, Leeds Teaching Hospitals NHS Trust, Leeds, United Kingdom
| | - Kate Absolom
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, United Kingdom
- Leeds Institute of Health Sciences, University of Leeds, Leeds, United Kingdom
| |
Collapse
|
5
|
Cusworth S, Gkoutos GV, Acharjee A. A novel generative adversarial networks modelling for the class imbalance problem in high dimensional omics data. BMC Med Inform Decis Mak 2024; 24:90. [PMID: 38549123 PMCID: PMC10979623 DOI: 10.1186/s12911-024-02487-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 03/22/2024] [Indexed: 04/01/2024] Open
Abstract
Class imbalance remains a large problem in high-throughput omics analyses, causing bias towards the over-represented class when training machine learning-based classifiers. Oversampling is a common method used to balance classes, allowing for better generalization of the training data. More naive approaches can introduce other biases into the data, being especially sensitive to inaccuracies in the training data, a problem considering the characteristically noisy data obtained in healthcare. This is especially a problem with high-dimensional data. A generative adversarial network-based method is proposed for creating synthetic samples from small, high-dimensional data, to improve upon other more naive generative approaches. The method was compared with 'synthetic minority over-sampling technique' (SMOTE) and 'random oversampling' (RO). Generative methods were validated by training classifiers on the balanced data.
Collapse
Affiliation(s)
- Samuel Cusworth
- Institute of Applied Health Research, University of Birmingham, Birmingham, UK
- NIHR Blood and Transplant Research Unit (BTRU) in Precision Transplant and Cellular Therapeutics, University of Birmingham, Birmingham, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, B15 2TT, Birmingham, UK
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TT, Birmingham, UK
- MRC Health Data Research UK (HDR), Midlands Site, UK
- Centre for Health Data Research, University of Birmingham, B15 2TT, Birmingham, UK
- NIHR Experimental Cancer Medicine Centre, B15 2TT, Birmingham, UK
| | - Animesh Acharjee
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, B15 2TT, Birmingham, UK.
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TT, Birmingham, UK.
- MRC Health Data Research UK (HDR), Midlands Site, UK.
- Centre for Health Data Research, University of Birmingham, B15 2TT, Birmingham, UK.
| |
Collapse
|
6
|
Maragno D, Buti G, Birbil Şİ, Liao Z, Bortfeld T, den Hertog D, Ajdari A. Embedding machine learning based toxicity models within radiotherapy treatment plan optimization. Phys Med Biol 2024; 69:075003. [PMID: 38412530 DOI: 10.1088/1361-6560/ad2d7e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 02/27/2024] [Indexed: 02/29/2024]
Abstract
Objective.This study addresses radiation-induced toxicity (RIT) challenges in radiotherapy (RT) by developing a personalized treatment planning framework. It leverages patient-specific data and dosimetric information to create an optimization model that limits adverse side effects using constraints learned from historical data.Approach.The study uses the optimization with constraint learning (OCL) framework, incorporating patient-specific factors into the optimization process. It consists of three steps: optimizing the baseline treatment plan using population-wide dosimetric constraints; training a machine learning (ML) model to estimate the patient's RIT for the baseline plan; and adapting the treatment plan to minimize RIT using ML-learned patient-specific constraints. Various predictive models, including classification trees, ensembles of trees, and neural networks, are applied to predict the probability of grade 2+ radiation pneumonitis (RP2+) for non-small cell lung (NSCLC) cancer patients three months post-RT. The methodology is assessed with four high RP2+ risk NSCLC patients, with the goal of optimizing the dose distribution to constrain the RP2+ outcome below a pre-specified threshold. Conventional and OCL-enhanced plans are compared based on dosimetric parameters and predicted RP2+ risk. Sensitivity analysis on risk thresholds and data uncertainty is performed using a toy NSCLC case.Main results.Experiments show the methodology's capacity to directly incorporate all predictive models into RT treatment planning. In the four patients studied, mean lung dose and V20 were reduced by an average of 1.78 Gy and 3.66%, resulting in an average RP2+ risk reduction from 95% to 42%. Notably, this reduction maintains tumor coverage, although in two cases, sparing the lung slightly increased spinal cord max-dose (0.23 and 0.79 Gy).Significance.By integrating patient-specific information into learned constraints, the study significantly reduces adverse side effects like RP2+ without compromising target coverage. This unified framework bridges the gap between predicting toxicities and optimizing treatment plans in personalized RT decision-making.
Collapse
Affiliation(s)
- Donato Maragno
- Amsterdam Business School, University of Amsterdam, Amsterdam, The Netherlands
| | - Gregory Buti
- Massachusetts General Hospital and Harvard Medical School, Department of Radiation Oncology, Division of Radiation BioPhysics, Boston, MA, United States of America
| | - Ş İlker Birbil
- Amsterdam Business School, University of Amsterdam, Amsterdam, The Netherlands
| | - Zhongxing Liao
- University of Texas' MD Anderson Cancer Center, Department of Radiation Oncology, Division of Radiation Oncology, Houston, TX, United States of America
| | - Thomas Bortfeld
- Massachusetts General Hospital and Harvard Medical School, Department of Radiation Oncology, Division of Radiation BioPhysics, Boston, MA, United States of America
| | - Dick den Hertog
- Amsterdam Business School, University of Amsterdam, Amsterdam, The Netherlands
| | - Ali Ajdari
- Massachusetts General Hospital and Harvard Medical School, Department of Radiation Oncology, Division of Radiation BioPhysics, Boston, MA, United States of America
| |
Collapse
|
7
|
Huma C, Hawon L, Sarisha J, Erdal T, Kevin C, Valentina KA. Advances in the field of developing biomarkers for re-irradiation: a how-to guide to small, powerful data sets and artificial intelligence. EXPERT REVIEW OF PRECISION MEDICINE AND DRUG DEVELOPMENT 2024; 9:3-16. [PMID: 38550554 PMCID: PMC10972602 DOI: 10.1080/23808993.2024.2325936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 02/28/2024] [Indexed: 04/01/2024]
Abstract
Introduction Patient selection remains challenging as the clinical use of re-irradiation (re-RT) increases. Re-RT data is limited to retrospective studies and small prospective single-institution reports, resulting in small, heterogenous data sets. Validated prognostic and predictive biomarkers are derived from large-volume studies with long-term follow-up. This review aims to examine existing re-RT publications and available data sets and discuss strategies using artificial intelligence (AI) to approach small data sets to optimize the use of re-RT data. Methods Re-RT publications were identified where associated public data was present. The existing literature on small data sets to identify biomarkers was also explored. Results Publications with associated public data were identified, with glioma and nasopharyngeal cancers emerging as the most common tumor sites where the use of re-RT was the primary management approach. Existing and emerging AI strategies have been used to approach small data sets including data generation, augmentation, discovery, and transfer learning. Conclusions Further data is needed to generate adaptive frameworks, improve the collection of specimens for molecular analysis, and improve the interpretability of results in re-RT data.
Collapse
Affiliation(s)
- Chaudhry Huma
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD, 20892, United States
| | - Lee Hawon
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD, 20892, United States
| | - Jagasia Sarisha
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD, 20892, United States
| | - Tasci Erdal
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD, 20892, United States
| | - Camphausen Kevin
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD, 20892, United States
| | - Krauze Andra Valentina
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD, 20892, United States
| |
Collapse
|
8
|
Zhao Y, Coppola A, Karamchandani U, Amiras D, Gupte CM. Artificial intelligence applied to magnetic resonance imaging reliably detects the presence, but not the location, of meniscus tears: a systematic review and meta-analysis. Eur Radiol 2024:10.1007/s00330-024-10625-7. [PMID: 38386028 DOI: 10.1007/s00330-024-10625-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 12/24/2023] [Accepted: 01/13/2024] [Indexed: 02/23/2024]
Abstract
OBJECTIVES To review and compare the accuracy of convolutional neural networks (CNN) for the diagnosis of meniscal tears in the current literature and analyze the decision-making processes utilized by these CNN algorithms. MATERIALS AND METHODS PubMed, MEDLINE, EMBASE, and Cochrane databases up to December 2022 were searched in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statement. Risk of analysis was used for all identified articles. Predictive performance values, including sensitivity and specificity, were extracted for quantitative analysis. The meta-analysis was divided between AI prediction models identifying the presence of meniscus tears and the location of meniscus tears. RESULTS Eleven articles were included in the final review, with a total of 13,467 patients and 57,551 images. Heterogeneity was statistically significantly large for the sensitivity of the tear identification analysis (I2 = 79%). A higher level of accuracy was observed in identifying the presence of a meniscal tear over locating tears in specific regions of the meniscus (AUC, 0.939 vs 0.905). Pooled sensitivity and specificity were 0.87 (95% confidence interval (CI) 0.80-0.91) and 0.89 (95% CI 0.83-0.93) for meniscus tear identification and 0.88 (95% CI 0.82-0.91) and 0.84 (95% CI 0.81-0.85) for locating the tears. CONCLUSIONS AI prediction models achieved favorable performance in the diagnosis, but not location, of meniscus tears. Further studies on the clinical utilities of deep learning should include standardized reporting, external validation, and full reports of the predictive performances of these models, with a view to localizing tears more accurately. CLINICAL RELEVANCE STATEMENT Meniscus tears are hard to diagnose in the knee magnetic resonance images. AI prediction models may play an important role in improving the diagnostic accuracy of clinicians and radiologists. KEY POINTS • Artificial intelligence (AI) provides great potential in improving the diagnosis of meniscus tears. • The pooled diagnostic performance for artificial intelligence (AI) in identifying meniscus tears was better (sensitivity 87%, specificity 89%) than locating the tears (sensitivity 88%, specificity 84%). • AI is good at confirming the diagnosis of meniscus tears, but future work is required to guide the management of the disease.
Collapse
Affiliation(s)
- Yi Zhao
- Imperial College London School of Medicine, Exhibition Rd, South Kensington, London, SW7 2BU, UK.
| | - Andrew Coppola
- Imperial College London School of Medicine, Exhibition Rd, South Kensington, London, SW7 2BU, UK
| | | | - Dimitri Amiras
- Imperial College London School of Medicine, Exhibition Rd, South Kensington, London, SW7 2BU, UK
- Imperial College London NHS Trust, London, UK
| | - Chinmay M Gupte
- Imperial College London School of Medicine, Exhibition Rd, South Kensington, London, SW7 2BU, UK
- Imperial College London NHS Trust, London, UK
| |
Collapse
|
9
|
Demircioğlu A. The effect of data resampling methods in radiomics. Sci Rep 2024; 14:2858. [PMID: 38310165 PMCID: PMC10838284 DOI: 10.1038/s41598-024-53491-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 02/01/2024] [Indexed: 02/05/2024] Open
Abstract
Radiomic datasets can be class-imbalanced, for instance, when the prevalence of diseases varies notably, meaning that the number of positive samples is much smaller than that of negative samples. In these cases, the majority class may dominate the model's training and thus negatively affect the model's predictive performance, leading to bias. Therefore, resampling methods are often utilized to class-balance the data. However, several resampling methods exist, and neither their relative predictive performance nor their impact on feature selection has been systematically analyzed. In this study, we aimed to measure the impact of nine resampling methods on radiomic models utilizing a set of fifteen publicly available datasets regarding their predictive performance. Furthermore, we evaluated the agreement and similarity of the set of selected features. Our results show that applying resampling methods did not improve the predictive performance on average. On specific datasets, slight improvements in predictive performance (+ 0.015 in AUC) could be seen. A considerable disagreement on the set of selected features was seen (only 28.7% of features agreed), which strongly impedes feature interpretability. However, selected features are similar when considering their correlation (82.9% of features correlated on average).
Collapse
Affiliation(s)
- Aydin Demircioğlu
- Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, 45147, Essen, Germany.
| |
Collapse
|
10
|
Alkhawaldeh IM, Albalkhi I, Naswhan AJ. Challenges and limitations of synthetic minority oversampling techniques in machine learning. World J Methodol 2023; 13:373-378. [PMID: 38229946 PMCID: PMC10789107 DOI: 10.5662/wjm.v13.i5.373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 09/30/2023] [Accepted: 11/03/2023] [Indexed: 12/20/2023] Open
Abstract
Oversampling is the most utilized approach to deal with class-imbalanced datasets, as seen by the plethora of oversampling methods developed in the last two decades. We argue in the following editorial the issues with oversampling that stem from the possibility of overfitting and the generation of synthetic cases that might not accurately represent the minority class. These limitations should be considered when using oversampling techniques. We also propose several alternate strategies for dealing with imbalanced data, as well as a future work perspective.
Collapse
Affiliation(s)
| | - Ibrahem Albalkhi
- Department of Neuroradiology, Alfaisal University, Great Ormond Street Hospital NHS Foundation Trust, London WC1N 3JH, United Kingdom
| | | |
Collapse
|
11
|
Cen HS, Dandamudi S, Lei X, Weight C, Desai M, Gill I, Duddalwar V. Diversity in Renal Mass Data Cohorts: Implications for Urology AI Researchers. Oncology 2023; 102:574-584. [PMID: 38104555 PMCID: PMC11178677 DOI: 10.1159/000535841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 12/08/2023] [Indexed: 12/19/2023]
Abstract
INTRODUCTION We examine the heterogeneity and distribution of the cohort populations in two publicly used radiological image cohorts, the Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma (TCIA TCGA KIRC) collection and 2019 MICCAI Kidney Tumor Segmentation Challenge (KiTS19), and deviations in real-world population renal cancer data from the National Cancer Database (NCDB) Participant User Data File (PUF) and tertiary center data. PUF data are used as an anchor for prevalence rate bias assessment. Specific gene expression and, therefore, biology of RCC differ by self-reported race, especially between the African American and Caucasian populations. AI algorithms learn from datasets, but if the dataset misrepresents the population, reinforcing bias may occur. Ignoring these demographic features may lead to inaccurate downstream effects, thereby limiting the translation of these analyses to clinical practice. Consciousness of model training biases is vital to patient care decisions when using models in clinical settings. METHODS Data elements evaluated included gender, demographics, reported pathologic grading, and cancer staging. American Urological Association risk levels were used. Poisson regression was performed to estimate the population-based and sample-specific estimation for prevalence rate and corresponding 95% confidence interval. SAS 9.4 was used for data analysis. RESULTS Compared to PUF, KiTS19 and TCGA KIRC oversampled Caucasian by 9.5% (95% CI, -3.7 to 22.7%) and 15.1% (95% CI, 1.5 to 28.8%), undersampled African American by -6.7% (95% CI, -10% to -3.3%), and -5.5% (95% CI, -9.3% to -1.8%). Tertiary also undersampled African American by -6.6% (95% CI, -8.7% to -4.6%). The tertiary cohort largely undersampled aggressive cancers by -14.7% (95% CI, -20.9% to -8.4%). No statistically significant difference was found among PUF, TCGA, and KiTS19 in aggressive rate; however, heterogeneities in risk are notable. CONCLUSION Heterogeneities between cohorts need to be considered in future AI training and cross-validation for renal masses.
Collapse
Affiliation(s)
- Harmony Selena Cen
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA,
| | - Siddhartha Dandamudi
- College of Human Medicine, Michigan State University, East Lansing, Michigan, USA
| | - Xiaomeng Lei
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Chris Weight
- Urologic Oncology, Cleveland Clinic, Cleveland, Ohio, USA
| | - Mihir Desai
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Inderbir Gill
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Vinay Duddalwar
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
12
|
Budiarto A, Tsang KCH, Wilson AM, Sheikh A, Shah SA. Machine Learning-Based Asthma Attack Prediction Models From Routinely Collected Electronic Health Records: Systematic Scoping Review. JMIR AI 2023; 2:e46717. [PMID: 38875586 PMCID: PMC11041490 DOI: 10.2196/46717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 09/28/2023] [Accepted: 10/09/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND An early warning tool to predict attacks could enhance asthma management and reduce the likelihood of serious consequences. Electronic health records (EHRs) providing access to historical data about patients with asthma coupled with machine learning (ML) provide an opportunity to develop such a tool. Several studies have developed ML-based tools to predict asthma attacks. OBJECTIVE This study aims to critically evaluate ML-based models derived using EHRs for the prediction of asthma attacks. METHODS We systematically searched PubMed and Scopus (the search period was between January 1, 2012, and January 31, 2023) for papers meeting the following inclusion criteria: (1) used EHR data as the main data source, (2) used asthma attack as the outcome, and (3) compared ML-based prediction models' performance. We excluded non-English papers and nonresearch papers, such as commentary and systematic review papers. In addition, we also excluded papers that did not provide any details about the respective ML approach and its result, including protocol papers. The selected studies were then summarized across multiple dimensions including data preprocessing methods, ML algorithms, model validation, model explainability, and model implementation. RESULTS Overall, 17 papers were included at the end of the selection process. There was considerable heterogeneity in how asthma attacks were defined. Of the 17 studies, 8 (47%) studies used routinely collected data both from primary care and secondary care practices together. Extreme imbalanced data was a notable issue in most studies (13/17, 76%), but only 38% (5/13) of them explicitly dealt with it in their data preprocessing pipeline. The gradient boosting-based method was the best ML method in 59% (10/17) of the studies. Of the 17 studies, 14 (82%) studies used a model explanation method to identify the most important predictors. None of the studies followed the standard reporting guidelines, and none were prospectively validated. CONCLUSIONS Our review indicates that this research field is still underdeveloped, given the limited body of evidence, heterogeneity of methods, lack of external validation, and suboptimally reported models. We highlighted several technical challenges (class imbalance, external validation, model explanation, and adherence to reporting guidelines to aid reproducibility) that need to be addressed to make progress toward clinical adoption.
Collapse
Affiliation(s)
- Arif Budiarto
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Bioinformatics and Data Science Research Center, Bina Nusantara University, Jakarta, Indonesia
| | - Kevin C H Tsang
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew M Wilson
- Norwich Medical School, University of East Anglia, Norwich, United Kingdom
- Norfolk and Norwich University Hospital NHS Foundation Trust, Norwich, United Kingdom
| | - Aziz Sheikh
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Syed Ahmar Shah
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
13
|
Krauze AV, Zhao Y, Li MC, Shih J, Jiang W, Tasci E, Cooley Zgela T, Sproull M, Mackey M, Shankavaram U, Tofilon P, Camphausen K. Revisiting Concurrent Radiation Therapy, Temozolomide, and the Histone Deacetylase Inhibitor Valproic Acid for Patients with Glioblastoma-Proteomic Alteration and Comparison Analysis with the Standard-of-Care Chemoirradiation. Biomolecules 2023; 13:1499. [PMID: 37892181 PMCID: PMC10604983 DOI: 10.3390/biom13101499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 10/29/2023] Open
Abstract
BACKGROUND Glioblastoma (GBM) is the most common brain tumor with an overall survival (OS) of less than 30% at two years. Valproic acid (VPA) demonstrated survival benefits documented in retrospective and prospective trials, when used in combination with chemo-radiotherapy (CRT). PURPOSE The primary goal of this study was to examine if the differential alteration in proteomic expression pre vs. post-completion of concurrent chemoirradiation (CRT) is present with the addition of VPA as compared to standard-of-care CRT. The second goal was to explore the associations between the proteomic alterations in response to VPA/RT/TMZ correlated to patient outcomes. The third goal was to use the proteomic profile to determine the mechanism of action of VPA in this setting. MATERIALS AND METHODS Serum obtained pre- and post-CRT was analyzed using an aptamer-based SOMAScan® proteomic assay. Twenty-nine patients received CRT plus VPA, and 53 patients received CRT alone. Clinical data were obtained via a database and chart review. Tests for differences in protein expression changes between radiation therapy (RT) with or without VPA were conducted for individual proteins using two-sided t-tests, considering p-values of <0.05 as significant. Adjustment for age, sex, and other clinical covariates and hierarchical clustering of significant differentially expressed proteins was carried out, and Gene Set Enrichment analyses were performed using the Hallmark gene sets. Univariate Cox proportional hazards models were used to test the individual protein expression changes for an association with survival. The lasso Cox regression method and 10-fold cross-validation were employed to test the combinations of expression changes of proteins that could predict survival. Predictiveness curves were plotted for significant proteins for VPA response (p-value < 0.005) to show the survival probability vs. the protein expression percentiles. RESULTS A total of 124 proteins were identified pre- vs. post-CRT that were differentially expressed between the cohorts who received CRT plus VPA and those who received CRT alone. Clinical factors did not confound the results, and distinct proteomic clustering in the VPA-treated population was identified. Time-dependent ROC curves for OS and PFS for landmark times of 20 months and 6 months, respectively, revealed AUC of 0.531, 0.756, 0.774 for OS and 0.535, 0.723, 0.806 for PFS for protein expression, clinical factors, and the combination of protein expression and clinical factors, respectively, indicating that the proteome can provide additional survival risk discrimination to that already provided by the standard clinical factors with a greater impact on PFS. Several proteins of interest were identified. Alterations in GALNT14 (increased) and CCL17 (decreased) (p = 0.003 and 0.003, respectively, FDR 0.198 for both) were associated with an improvement in both OS and PFS. The pre-CRT protein expression revealed 480 proteins predictive for OS and 212 for PFS (p < 0.05), of which 112 overlapped between OS and PFS. However, FDR-adjusted p values were high, with OS (the smallest p value of 0.586) and PFS (the smallest p value of 0.998). The protein PLCD3 had the lowest p-value (p = 0.002 and 0.0004 for OS and PFS, respectively), and its elevation prior to CRT predicted superior OS and PFS with VPA administration. Cancer hallmark genesets associated with proteomic alteration observed with the administration of VPA aligned with known signal transduction pathways of this agent in malignancy and non-malignancy settings, and GBM signaling, and included epithelial-mesenchymal transition, hedgehog signaling, Il6/JAK/STAT3, coagulation, NOTCH, apical junction, xenobiotic metabolism, and complement signaling. CONCLUSIONS Differential alteration in proteomic expression pre- vs. post-completion of concurrent chemoirradiation (CRT) is present with the addition of VPA. Using pre- vs. post-data, prognostic proteins emerged in the analysis. Using pre-CRT data, potentially predictive proteins were identified. The protein signals and hallmark gene sets associated with the alteration in the proteome identified between patients who received VPA and those who did not, align with known biological mechanisms of action of VPA and may allow for the identification of novel biomarkers associated with outcomes that can help advance the study of VPA in future prospective trials.
Collapse
Affiliation(s)
- Andra V. Krauze
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health (NIH), 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA (T.C.Z.); (U.S.); (P.T.)
| | - Yingdong Zhao
- Computational and Systems Biology Branch, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Rockville, Maryland 20850, USA; (Y.Z.); (M.-C.L.); (J.S.)
| | - Ming-Chung Li
- Computational and Systems Biology Branch, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Rockville, Maryland 20850, USA; (Y.Z.); (M.-C.L.); (J.S.)
| | - Joanna Shih
- Computational and Systems Biology Branch, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Rockville, Maryland 20850, USA; (Y.Z.); (M.-C.L.); (J.S.)
| | - Will Jiang
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health (NIH), 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA (T.C.Z.); (U.S.); (P.T.)
| | - Erdal Tasci
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health (NIH), 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA (T.C.Z.); (U.S.); (P.T.)
| | - Theresa Cooley Zgela
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health (NIH), 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA (T.C.Z.); (U.S.); (P.T.)
| | - Mary Sproull
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health (NIH), 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA (T.C.Z.); (U.S.); (P.T.)
| | - Megan Mackey
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health (NIH), 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA (T.C.Z.); (U.S.); (P.T.)
| | - Uma Shankavaram
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health (NIH), 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA (T.C.Z.); (U.S.); (P.T.)
| | - Philip Tofilon
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health (NIH), 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA (T.C.Z.); (U.S.); (P.T.)
| | - Kevin Camphausen
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health (NIH), 9000 Rockville Pike, Building 10, CRC, Bethesda, MD 20892, USA (T.C.Z.); (U.S.); (P.T.)
| |
Collapse
|
14
|
Tasci E, Jagasia S, Zhuge Y, Camphausen K, Krauze AV. GradWise: A Novel Application of a Rank-Based Weighted Hybrid Filter and Embedded Feature Selection Method for Glioma Grading with Clinical and Molecular Characteristics. Cancers (Basel) 2023; 15:4628. [PMID: 37760597 PMCID: PMC10526509 DOI: 10.3390/cancers15184628] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 09/01/2023] [Accepted: 09/14/2023] [Indexed: 09/29/2023] Open
Abstract
Glioma grading plays a pivotal role in guiding treatment decisions, predicting patient outcomes, facilitating clinical trial participation and research, and tailoring treatment strategies. Current glioma grading in the clinic is based on tissue acquired at the time of resection, with tumor aggressiveness assessed from tumor morphology and molecular features. The increased emphasis on molecular characteristics as a guide for management and prognosis estimation underscores is driven by the need for accurate and standardized grading systems that integrate molecular and clinical information in the grading process and carry the expectation of the exposure of molecular markers that go beyond prognosis to increase understanding of tumor biology as a means of identifying druggable targets. In this study, we introduce a novel application (GradWise) that combines rank-based weighted hybrid filter (i.e., mRMR) and embedded (i.e., LASSO) feature selection methods to enhance the performance of feature selection and machine learning models for glioma grading using both clinical and molecular predictors. We utilized publicly available TCGA from the UCI ML Repository and CGGA datasets to identify the most effective scheme that allows for the selection of the minimum number of features with their names. Two popular feature selection methods with a rank-based weighting procedure were employed to conduct comprehensive experiments with the five supervised models. The computational results demonstrate that our proposed method achieves an accuracy rate of 87.007% with 13 features and an accuracy rate of 80.412% with five features on the TCGA and CGGA datasets, respectively. We also obtained four shared biomarkers for the glioma grading that emerged in both datasets and can be employed with transferable value to other datasets and data-based outcome analyses. These findings are a significant step toward highlighting the effectiveness of our approach by offering pioneering results with novel markers with prospects for understanding and targeting the biologic mechanisms of glioma progression to improve patient outcomes.
Collapse
Affiliation(s)
| | | | | | | | - Andra Valentina Krauze
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA
| |
Collapse
|
15
|
Skuban-Eiseler T, Orzechowski M, Denkinger M, Kocar TD, Leinert C, Steger F. Artificial Intelligence-Based Clinical Decision Support Systems in Geriatrics: An Ethical Analysis. J Am Med Dir Assoc 2023; 24:1271-1276.e4. [PMID: 37453451 DOI: 10.1016/j.jamda.2023.06.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 06/08/2023] [Accepted: 06/08/2023] [Indexed: 07/18/2023]
Abstract
OBJECTIVES To provide an ethical analysis of the implications of the usage of artificial intelligence-supported clinical decision support systems (AI-CDSS) in geriatrics. DESIGN Ethical analysis based on the normative arguments regarding the use of AI-CDSS in geriatrics using a principle-based ethical framework. SETTING AND PARTICIPANTS Normative arguments identified in 29 articles on AI-CDSS in geriatrics. METHODS Our analysis is based on a literature search that was done to determine ethical arguments that are currently discussed regarding AI-CDSS. The relevant articles were subjected to a detailed qualitative analysis regarding the ethical considerations Supplementary Datamentioned therein. We then discussed the identified arguments within the frame of the 4 principles of medical ethics according to Beauchamp and Childress and with respect to the needs of frail older adults. RESULTS We found a total of 5089 articles; 29 articles met the inclusion criteria and were subsequently subjected to a detailed qualitative analysis. We could not identify any systematic analysis of the ethical implications of AI-CDSS in geriatrics. The ethical considerations are very unsystematic and scattered, and the existing literature has a predominantly technical focus emphasizing the technology's utility. In an extensive ethical analysis, we systematically discuss the ethical implications of the usage of AI-CDSS in geriatrics. CONCLUSIONS AND IMPLICATIONS AI-CDSS in geriatrics can be a great asset, especially when dealing with patients with cognitive disorders; however, from an ethical perspective, we see the need for further research. By using AI-CDSS, older patients' values and beliefs might be overlooked, and the quality of the doctor-patient relationship might be altered, endangering compliance to the 4 ethical principles of Beauchamp and Childress.
Collapse
Affiliation(s)
- Tobias Skuban-Eiseler
- Institute of the History, Philosophy and Ethics of Medicine, Faculty of Medicine, Ulm University, Ulm, Germany; kbo-Isar-Amper-Klinikum Region München, München-Haar, Germany.
| | - Marcin Orzechowski
- Institute of the History, Philosophy and Ethics of Medicine, Faculty of Medicine, Ulm University, Ulm, Germany
| | - Michael Denkinger
- Institute of Geriatric Research, Ulm University Medical Center, Ulm, Germany; AGAPLESION Bethesda Clinic Ulm, Ulm, Germany
| | - Thomas Derya Kocar
- Institute of Geriatric Research, Ulm University Medical Center, Ulm, Germany; AGAPLESION Bethesda Clinic Ulm, Ulm, Germany
| | - Christoph Leinert
- Institute of Geriatric Research, Ulm University Medical Center, Ulm, Germany; AGAPLESION Bethesda Clinic Ulm, Ulm, Germany
| | - Florian Steger
- Institute of the History, Philosophy and Ethics of Medicine, Faculty of Medicine, Ulm University, Ulm, Germany
| |
Collapse
|
16
|
Deng Y, Ma Y, Fu J, Wang X, Yu C, Lv J, Man S, Wang B, Li L. A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study. Heliyon 2023; 9:e18758. [PMID: 37576311 PMCID: PMC10412833 DOI: 10.1016/j.heliyon.2023.e18758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 08/15/2023] Open
Abstract
Background Non-alcoholic fatty liver disease (NAFLD) is one of the most common liver diseases worldwide. Currently, most NAFLD prediction models are diagnostic models based on cross-sectional data, which failed to provide early identification or clarify causal relationships. We aimed to use time-series deep learning models with longitudinal health checkup records to predict the onset of NAFLD in the future, and update the model stepwise by incorporating new checkup records to achieve dynamic prediction. Methods 10,493 participants with over 6 health checkup records from Beijing MJ Health Screening Center were included to conduct a retrospective cohort study, in which the constantly updated initial 5 checkup data were incorporated stepwise to predict the risk of NAFLD at and after their sixth health checkups. A total of 33 variables were considered, consisting of demographic characteristics, medical history, lifestyle, physical examinations, and laboratory tests. L1-penalized logistic regression (LR) was used for feature selection. The long short-term memory (LSTM) algorithm was introduced for model development, and five-fold cross-validation was conducted to tune and choose optimal hyperparameters. Both internal validation and external validation were conducted, using the 20% randomly divided holdout test dataset and previously unseen data from Shanghai MJ Health Screening Center, respectively, to evaluate model performance. The evaluation metrics included area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, Brier score, and decision curve. Bootstrap sampling was implemented to generate 95% confidence intervals of all the metrics. Finally, the Shapley additive explanations (SHAP) algorithm was applied in the holdout test dataset for model interpretability to obtain time-specific and sample-specific contributions of each feature. Results Among the 10,493 participants, 1662 (15.84%) were diagnosed with NAFLD at and after their sixth health checkups. The predictive performance of the deep learning model in the internal validation dataset improved over the incorporation of the checkups, with AUROC increasing from 0.729 (95% CI: 0.698,0.760) at baseline to 0.818 (95% CI: 0.798,0.844) when consecutive 5 checkups were included. The external validation dataset, containing 1728 participants, was used to verify the results, in which AUROC increased from 0.700 (95% CI: 0.657,0.740) with only the first checkups to 0.792 (95% CI: 0.758,0.825) with all five. The results of feature significance showed that body fat percentage, alanine transaminase (ALT), and uric acid owned the greatest impact on the outcome, time-specific, individual-specific and dynamic feature contributions were also produced for model interpretability. Conclusion A dynamic prediction model was successfully established in our study, and the prediction capability kept improving with the renewal of the latest checkup records. In addition, we identified key features associated with the onset of NAFLD, making it possible to optimize the prevention and control strategies of the disease in the general population.
Collapse
Affiliation(s)
- Yuhan Deng
- Chongqing Research Institute of Big Data, Peking University, Chongqing, China
- Meinian Institute of Health, Beijing, China
| | - Yuan Ma
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jingzhu Fu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Peking University Health Science Center Meinian Public Health Institute, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | | | - Canqing Yu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Peking University Health Science Center Meinian Public Health Institute, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing, China
| | - Jun Lv
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Peking University Health Science Center Meinian Public Health Institute, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing, China
| | - Sailimai Man
- Meinian Institute of Health, Beijing, China
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Peking University Health Science Center Meinian Public Health Institute, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Bo Wang
- Meinian Institute of Health, Beijing, China
- Peking University Health Science Center Meinian Public Health Institute, Beijing, China
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing, China
| | - Liming Li
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Peking University Health Science Center Meinian Public Health Institute, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing, China
| |
Collapse
|
17
|
Marin L, Casado F. Prediction of prostate cancer biochemical recurrence by using discretization supports the critical contribution of the extra-cellular matrix genes. Sci Rep 2023; 13:10144. [PMID: 37349324 PMCID: PMC10287745 DOI: 10.1038/s41598-023-35821-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 05/24/2023] [Indexed: 06/24/2023] Open
Abstract
Due to its complexity, much effort has been devoted to the development of biomarkers for prostate cancer that have acquired the utmost clinical relevance for diagnosis and grading. However, all of these advances are limited due to the relatively large percentage of biochemical recurrence (BCR) and the limited strategies for follow up. This work proposes a methodology that uses discretization to predict prostate cancer BCR while optimizing the necessary variables. We used discretization of RNA-seq data to increase the prediction of biochemical recurrence and retrieve a subset of ten genes functionally known to be related to the tissue structure. Equal width and equal frequency data discretization methods were compared to isolate the contribution of the genes and their interval of action, simultaneously. Adding a robust clinical biomarker such as prostate specific antigen (PSA) improved the prediction of BCR. Discretization allowed classifying the cancer patients with an accuracy of 82% on testing datasets, and 75% on a validation dataset when a five-bin discretization by equal width was used. After data pre-processing, feature selection and classification, our predictions had a precision of 71% (testing dataset: MSKCC and GSE54460) and 69% (Validation dataset: GSE70769) should the patients present BCR up to 24 months after their final treatment. These results emphasize the use of equal width discretization as a pre-processing step to improve classification for a limited number of genes in the signature. Functionally, many of these genes have a direct or expected role in tissue structure and extracellular matrix organization. The processing steps presented in this study are also applicable to other cancer types to increase the speed and accuracy of the models in diverse datasets.
Collapse
Affiliation(s)
- Laura Marin
- Department of Engineering, Pontificia Universidad Catolica del Peru, Av. Universitaria 1801, San Miguel, 15088, Lima, Peru
- Institute of Omics Sciences and Applied Biotechnology, Pontificia Universidad Catolica del Peru, Av. Universitaria 1801, San Miguel, 15088, Lima, Peru
| | - Fanny Casado
- Institute of Omics Sciences and Applied Biotechnology, Pontificia Universidad Catolica del Peru, Av. Universitaria 1801, San Miguel, 15088, Lima, Peru.
| |
Collapse
|
18
|
Ahmed AA, Brychcy A, Abouzid M, Witt M, Kaczmarek E. Perception of Pathologists in Poland of Artificial Intelligence and Machine Learning in Medical Diagnosis-A Cross-Sectional Study. J Pers Med 2023; 13:962. [PMID: 37373951 DOI: 10.3390/jpm13060962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 05/31/2023] [Accepted: 06/04/2023] [Indexed: 06/29/2023] Open
Abstract
BACKGROUND In the past vicennium, several artificial intelligence (AI) and machine learning (ML) models have been developed to assist in medical diagnosis, decision making, and design of treatment protocols. The number of active pathologists in Poland is low, prolonging tumor patients' diagnosis and treatment journey. Hence, applying AI and ML may aid in this process. Therefore, our study aims to investigate the knowledge of using AI and ML methods in the clinical field in pathologists in Poland. To our knowledge, no similar study has been conducted. METHODS We conducted a cross-sectional study targeting pathologists in Poland from June to July 2022. The questionnaire included self-reported information on AI or ML knowledge, experience, specialization, personal thoughts, and level of agreement with different aspects of AI and ML in medical diagnosis. Data were analyzed using IBM® SPSS® Statistics v.26, PQStat Software v.1.8.2.238, and RStudio Build 351. RESULTS Overall, 68 pathologists in Poland participated in our study. Their average age and years of experience were 38.92 ± 8.88 and 12.78 ± 9.48 years, respectively. Approximately 42% used AI or ML methods, which showed a significant difference in the knowledge gap between those who never used it (OR = 17.9, 95% CI = 3.57-89.79, p < 0.001). Additionally, users of AI had higher odds of reporting satisfaction with the speed of AI in the medical diagnosis process (OR = 4.66, 95% CI = 1.05-20.78, p = 0.043). Finally, significant differences (p = 0.003) were observed in determining the liability for legal issues used by AI and ML methods. CONCLUSION Most pathologists in this study did not use AI or ML models, highlighting the importance of increasing awareness and educational programs regarding applying AI and ML in medical diagnosis.
Collapse
Affiliation(s)
- Alhassan Ali Ahmed
- Department of Bioinformatics and Computational Biology, Poznan University of Medical Sciences, 61-806 Poznan, Poland
- Doctoral School, Poznan University of Medical Sciences, 61-806 Poznan, Poland
| | - Agnieszka Brychcy
- Department of Clinical Patomorphology, Heliodor Swiecicki Clinical Hospital of the Poznan University of Medical Sciences, 61-806 Poznan, Poland
| | - Mohamed Abouzid
- Doctoral School, Poznan University of Medical Sciences, 61-806 Poznan, Poland
- Department of Physical Pharmacy and Pharmacokinetics, Poznan University of Medical Sciences, 60-806 Poznan, Poland
| | - Martin Witt
- Department of Anatomy, Rostock University Medical Centre, 18057 Rostock, Germany
- Department of Anatomy, Technische Universität Dresden, 01307 Dresden, Germany
| | - Elżbieta Kaczmarek
- Department of Bioinformatics and Computational Biology, Poznan University of Medical Sciences, 61-806 Poznan, Poland
| |
Collapse
|
19
|
Duan J, Li H, Ma X, Zhang H, Lasky R, Monaghan CK, Chaudhuri S, Usvyat L, Gu M, Guo W, Kotanko P, Wang Y. Predicting SARS-CoV-2 infection among hemodialysis patients using multimodal data. FRONTIERS IN NEPHROLOGY 2023; 3:1179342. [PMID: 37675373 PMCID: PMC10479652 DOI: 10.3389/fneph.2023.1179342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 04/28/2023] [Indexed: 09/08/2023]
Abstract
Background The coronavirus disease 2019 (COVID-19) pandemic has created more devastation among dialysis patients than among the general population. Patient-level prediction models for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection are crucial for the early identification of patients to prevent and mitigate outbreaks within dialysis clinics. As the COVID-19 pandemic evolves, it is unclear whether or not previously built prediction models are still sufficiently effective. Methods We developed a machine learning (XGBoost) model to predict during the incubation period a SARS-CoV-2 infection that is subsequently diagnosed after 3 or more days. We used data from multiple sources, including demographic, clinical, treatment, laboratory, and vaccination information from a national network of hemodialysis clinics, socioeconomic information from the Census Bureau, and county-level COVID-19 infection and mortality information from state and local health agencies. We created prediction models and evaluated their performances on a rolling basis to investigate the evolution of prediction power and risk factors. Result From April 2020 to August 2020, our machine learning model achieved an area under the receiver operating characteristic curve (AUROC) of 0.75, an improvement of over 0.07 from a previously developed machine learning model published by Kidney360 in 2021. As the pandemic evolved, the prediction performance deteriorated and fluctuated more, with the lowest AUROC of 0.6 in December 2021 and January 2022. Over the whole study period, that is, from April 2020 to February 2022, fixing the false-positive rate at 20%, our model was able to detect 40% of the positive patients. We found that features derived from local infection information reported by the Centers for Disease Control and Prevention (CDC) were the most important predictors, and vaccination status was a useful predictor as well. Whether or not a patient lives in a nursing home was an effective predictor before vaccination, but became less predictive after vaccination. Conclusion As found in our study, the dynamics of the prediction model are frequently changing as the pandemic evolves. County-level infection information and vaccination information are crucial for the success of early COVID-19 prediction models. Our results show that the proposed model can effectively identify SARS-CoV-2 infections during the incubation period. Prospective studies are warranted to explore the application of such prediction models in daily clinical practice.
Collapse
Affiliation(s)
- Juntao Duan
- Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, United States
| | - Hanmo Li
- Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, United States
| | - Xiaoran Ma
- Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, United States
| | - Hanjie Zhang
- Renal Research Institute, New York NY, United States
| | - Rachel Lasky
- Fresenius Medical Care, Global Medical Office, Waltham, MA, United States
| | | | - Sheetal Chaudhuri
- Fresenius Medical Care, Global Medical Office, Waltham, MA, United States
- Division of Nephrology, Maastricht University Medical Center, Maastricht, Netherlands
| | - Len A. Usvyat
- Fresenius Medical Care, Global Medical Office, Waltham, MA, United States
| | - Mengyang Gu
- Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, United States
| | - Wensheng Guo
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia PA, United States
| | - Peter Kotanko
- Renal Research Institute, New York NY, United States
- Icahn School of Medicine at Mount Sinai, New York NY, United States
| | - Yuedong Wang
- Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, United States
| |
Collapse
|
20
|
Tasci E, Jagasia S, Zhuge Y, Sproull M, Cooley Zgela T, Mackey M, Camphausen K, Krauze AV. RadWise: A Rank-Based Hybrid Feature Weighting and Selection Method for Proteomic Categorization of Chemoirradiation in Patients with Glioblastoma. Cancers (Basel) 2023; 15:2672. [PMID: 37345009 PMCID: PMC10216128 DOI: 10.3390/cancers15102672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/03/2023] [Accepted: 05/06/2023] [Indexed: 06/23/2023] Open
Abstract
Glioblastomas (GBM) are rapidly growing, aggressive, nearly uniformly fatal, and the most common primary type of brain cancer. They exhibit significant heterogeneity and resistance to treatment, limiting the ability to analyze dynamic biological behavior that drives response and resistance, which are central to advancing outcomes in glioblastoma. Analysis of the proteome aimed at signal change over time provides a potential opportunity for non-invasive classification and examination of the response to treatment by identifying protein biomarkers associated with interventions. However, data acquired using large proteomic panels must be more intuitively interpretable, requiring computational analysis to identify trends. Machine learning is increasingly employed, however, it requires feature selection which has a critical and considerable effect on machine learning problems when applied to large-scale data to reduce the number of parameters, improve generalization, and find essential predictors. In this study, using 7k proteomic data generated from the analysis of serum obtained from 82 patients with GBM pre- and post-completion of concurrent chemoirradiation (CRT), we aimed to select the most discriminative proteomic features that define proteomic alteration that is the result of administering CRT. Thus, we present a novel rank-based feature weighting method (RadWise) to identify relevant proteomic parameters using two popular feature selection methods, least absolute shrinkage and selection operator (LASSO) and the minimum redundancy maximum relevance (mRMR). The computational results show that the proposed method yields outstanding results with very few selected proteomic features, with higher accuracy rate performance than methods that do not employ a feature selection process. While the computational method identified several proteomic signals identical to the clinical intuitive (heuristic approach), several heuristically identified proteomic signals were not selected while other novel proteomic biomarkers not selected with the heuristic approach that carry biological prognostic relevance in GBM only emerged with the novel method. The computational results show that the proposed method yields promising results, reducing 7k proteomic data to 7 selected proteomic features with a performance value of 93.921%, comparing favorably with techniques that do not employ feature selection.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Andra Valentina Krauze
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA
| |
Collapse
|
21
|
Groff E, Orzechowski M, Schuetz C, Steger F. Ethical Aspects of Personalized Research and Management of Systemic Inflammatory Response Syndrome (SIRS) in Children. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 20:470. [PMID: 36612792 PMCID: PMC9819223 DOI: 10.3390/ijerph20010470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 12/21/2022] [Accepted: 12/22/2022] [Indexed: 06/17/2023]
Abstract
Systemic inflammatory response syndrome (SIRS) is a life-threatening condition with nonspecific symptoms. Because of that, defining a targeted therapy against SIRS in children and adults remains a challenge. The identification of diagnostic patterns from individualized immuneprofiling can lead to development of a personalized therapy. The aim of this study was to identify and analyze ethical issues associated with personalized research and therapy for SIRS in pediatric populations. We conducted an ethical analysis based on a principled approach according to Beauchamp and Childress' four bioethical principles. Relevant information for the research objectives was extracted from a systematic literature review conducted in the scientific databases PubMed, Embase and Web of Science. We searched for pertinent themes dealing with at least one of the four bioethical principles: "autonomy", "non-maleficence", "beneficence" and "justice". 48 publications that met the research objectives were included in the thorough analysis, structured and discussed in a narrative synthesis. From the analysis of the results, it has emerged that traditional paradigms of patient's autonomy and physician paternalism need to be reexamined in pediatric research. Standard information procedures and models of informed consent should be reconsidered as they do not accommodate the complexities of pediatric omics research.
Collapse
Affiliation(s)
- Elisa Groff
- Institute of the History, Philosophy and Ethics of Medicine, Ulm University, 89073 Ulm, Germany
| | - Marcin Orzechowski
- Institute of the History, Philosophy and Ethics of Medicine, Ulm University, 89073 Ulm, Germany
| | - Catharina Schuetz
- Paediatric Immunology, Medical Faculty “Carl Gustav Carus”, Technic University Dresden, 01307 Dresden, Germany
| | - Florian Steger
- Institute of the History, Philosophy and Ethics of Medicine, Ulm University, 89073 Ulm, Germany
| |
Collapse
|
22
|
Cost Matrix of Molecular Pathology in Glioma-Towards AI-Driven Rational Molecular Testing and Precision Care for the Future. Biomedicines 2022; 10:biomedicines10123029. [PMID: 36551786 PMCID: PMC9775648 DOI: 10.3390/biomedicines10123029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 11/09/2022] [Accepted: 11/19/2022] [Indexed: 11/27/2022] Open
Abstract
Gliomas are the most common and aggressive primary brain tumors. Gliomas carry a poor prognosis because of the tumor's resistance to radiation and chemotherapy leading to nearly universal recurrence. Recent advances in large-scale genomic research have allowed for the development of more targeted therapies to treat glioma. While precision medicine can target specific molecular features in glioma, targeted therapies are often not feasible due to the lack of actionable markers and the high cost of molecular testing. This review summarizes the clinically relevant molecular features in glioma and the current cost of care for glioma patients, focusing on the molecular markers and meaningful clinical features that are linked to clinical outcomes and have a realistic possibility of being measured, which is a promising direction for precision medicine using artificial intelligence approaches.
Collapse
|
23
|
Hierarchical Voting-Based Feature Selection and Ensemble Learning Model Scheme for Glioma Grading with Clinical and Molecular Characteristics. Int J Mol Sci 2022; 23:ijms232214155. [PMID: 36430631 PMCID: PMC9697273 DOI: 10.3390/ijms232214155] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/31/2022] [Accepted: 11/12/2022] [Indexed: 11/18/2022] Open
Abstract
Determining the aggressiveness of gliomas, termed grading, is a critical step toward treatment optimization to increase the survival rate and decrease treatment toxicity for patients. Streamlined grading using molecular information has the potential to facilitate decision making in the clinic and aid in treatment planning. In recent years, molecular markers have increasingly gained importance in the classification of tumors. In this study, we propose a novel hierarchical voting-based methodology for improving the performance results of the feature selection stage and machine learning models for glioma grading with clinical and molecular predictors. To identify the best scheme for the given soft-voting-based ensemble learning model selections, we utilized publicly available TCGA and CGGA datasets and employed four dimensionality reduction methods to carry out a voting-based ensemble feature selection and five supervised models, with a total of sixteen combination sets. We also compared our proposed feature selection method with the LASSO feature selection method in isolation. The computational results indicate that the proposed method achieves 87.606% and 79.668% accuracy rates on TCGA and CGGA datasets, respectively, outperforming the LASSO feature selection method.
Collapse
|
24
|
Milella F, Famiglini L, Banfi G, Cabitza F. Application of Machine Learning to Improve Appropriateness of Treatment in an Orthopaedic Setting of Personalized Medicine. J Pers Med 2022; 12:jpm12101706. [PMID: 36294845 PMCID: PMC9604727 DOI: 10.3390/jpm12101706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 09/26/2022] [Accepted: 10/08/2022] [Indexed: 11/07/2022] Open
Abstract
The rise of personalized medicine and its remarkable advancements have revealed new requirements for the availability of appropriate medical decision-making models. Computer science is an area that plays an essential role in the field of personalized medicine, where one of the goals is to provide algorithms and tools to extrapolate knowledge and improve the decision-support process. The minimum clinically important difference (MCID) is the smallest change in PROM scores that patients perceive as meaningful. Treatment that does not achieve the minimum level of improvement is considered inappropriate as well as a potential waste of resources. Using the MCID threshold to identify patients who fail to achieve the minimum change in PROM that results in a meaningful outcome may aid in pre-surgical shared decision-making. The decision tree algorithm is a method for extracting valuable information and providing further meaningful information to the domain expert that supports the decision-making. In the present study, different tools based on machine learning were developed. On the one hand, we compared three XGBoost models to predict the non-achievement of the MCID at six months post-operation in the SF-12 physical score. The prediction score threshold was set to 0.75 to provide three decision-making areas on the basis of the high confidence (HC) intervals; the minority class was re-balanced by weighting the positive class to penalize the loss function (XGBoost cost-sensitive), oversampling the minority class (XGBoost with SMOTE), and re-sampling the negative class (XGBoost with undersampling). On the other hand, we modeled the data through a decision tree (assessment tree), based on different complexity levels, to identify the hidden pattern and to provide a new way to understand possible relationships between the gathered features and the several outcomes. The results showed that all the proposed models were effective as binary classifiers, as they showed moderate predictive performance both regarding the minority or positive class (i.e., our targeted patients, those who will not benefit from surgery) and the negative class. The decision tree visualization can be exploited during the patient assessment status to better understand if those patients will benefit or not from the medical intervention. Both of these tools can come in handy for increasing knowledge about the patient’s psychophysical state and for creating an increasingly specialized assessment of the individual patient.
Collapse
Affiliation(s)
- Frida Milella
- IRCCS Istituto Ortopedico Galeazzi, Via Cristina Belgioioso 173, 20157 Milano, Italy
- Correspondence:
| | - Lorenzo Famiglini
- DISCo, Dipartimento di Informatica, Sistemistica e Comunicazione, University of Milano–Bicocca, Viale Sarca 336, 20126 Milano, Italy
| | - Giuseppe Banfi
- IRCCS Istituto Ortopedico Galeazzi, Via Cristina Belgioioso 173, 20157 Milano, Italy
- Faculty of Medicine and Surgery, Università Vita-Salute San Raffaele, 20132 Milano, Italy
| | - Federico Cabitza
- IRCCS Istituto Ortopedico Galeazzi, Via Cristina Belgioioso 173, 20157 Milano, Italy
- DISCo, Dipartimento di Informatica, Sistemistica e Comunicazione, University of Milano–Bicocca, Viale Sarca 336, 20126 Milano, Italy
| |
Collapse
|
25
|
Zhao R, Zhuge Y, Camphausen K, Krauze AV. Machine learning based survival prediction in Glioma using large-scale registry data. Health Informatics J 2022; 28:14604582221135427. [PMID: 36264067 PMCID: PMC10673681 DOI: 10.1177/14604582221135427] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2023]
Abstract
Gliomas are the most common central nervous system tumors exhibiting poor clinical outcomes. The ability to estimate prognosis is crucial for both patients and providers in order to select the most appropriate treatment. Machine learning (ML) allows for sophisticated approaches to survival prediction using real world clinical parameters needed to achieve superior predictive accuracy. We employed Cox Proportional hazards (CPH) model, Support Vector Machine (SVM) model, Random Forest (RF) model in a large glioma dataset (3462 patients, diagnosed 2000-2018) to explore the most optimal approach to survival prediction. Features employed were age, sex, surgical resection status, tumor histology and tumor site, administration of radiation therapy (RT) and chemotherapy status. Concordance index (c-index) was employed to assess the accuracy of survival time prediction. All three models performed well with prediction accuracy (CI 0.767, 0.771, 0.57 for CPH, SVM, RF models respectively) with the best performance achieved when incorporating RT and chemotherapy administration status which emerged as key predictive features. Within the subset of glioblastoma patients, similar prediction accuracy was achieved. These findings should prompt stricter clinician oversight over registry data accuracy through quality assurance as we move towards meaningful predictive ability using ML approaches in glioma.
Collapse
Affiliation(s)
| | | | | | - Andra V Krauze
- 3421National Cancer Institute, NIH, USA; 184934BC Cancer Surrey, Canada
| |
Collapse
|
26
|
The Next Frontier in Health Disparities—A Closer Look at Exploring Sex Differences in Glioma Data and Omics Analysis, from Bench to Bedside and Back. Biomolecules 2022; 12:biom12091203. [PMID: 36139042 PMCID: PMC9496358 DOI: 10.3390/biom12091203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 08/23/2022] [Accepted: 08/26/2022] [Indexed: 11/16/2022] Open
Abstract
Sex differences are increasingly being explored and reported in oncology, and glioma is no exception. As potentially meaningful sex differences are uncovered, existing gender-derived disparities mirror data generated in retrospective and prospective trials, real-world large-scale data sets, and bench work involving animals and cell lines. The resulting disparities at the data level are wide-ranging, potentially resulting in both adverse outcomes and failure to identify and exploit therapeutic benefits. We set out to analyze the literature on women’s data disparities in glioma by exploring the origins of data in this area to understand the representation of women in study samples and omics analyses. Given the current emphasis on inclusive study design and research, we wanted to explore if sex bias continues to exist in present-day data sets and how sex differences in data may impact conclusions derived from large-scale data sets, omics, biospecimen analysis, novel interventions, and standard of care management.
Collapse
|