1
|
Al Meslamani AZ, Sobrino I, de la Fuente J. Machine learning in infectious diseases: potential applications and limitations. Ann Med 2024; 56:2362869. [PMID: 38853633 PMCID: PMC11168216 DOI: 10.1080/07853890.2024.2362869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/02/2024] [Indexed: 06/11/2024] Open
Abstract
Infectious diseases are a major threat for human and animal health worldwide. Artificial Intelligence (AI) combined algorithms including Machine Learning and Big Data analytics have emerged as a potential solution to analyse diverse datasets and face challenges posed by infectious diseases. In this commentary we explore the potential applications and limitations of ML to management of infectious disease. It explores challenges in key areas such as outbreak prediction, pathogen identification, drug discovery, and personalized medicine. We propose potential solutions to mitigate these hurdles and applications of ML to identify biomolecules for effective treatment and prevention of infectious diseases. In addition to use of ML for management of infectious diseases, potential applications are based on catastrophic evolution events for the identification of biomolecular targets to reduce risks for infectious diseases and vaccinomics for discovery and characterization of vaccine protective antigens using intelligent Big Data analytics techniques. These considerations set a foundation for developing effective strategies for managing infectious diseases in the future.
Collapse
Affiliation(s)
- Ahmad Z. Al Meslamani
- College of Pharmacy, Al Ain University, Abu Dhabi, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi, United Arab Emirates
| | - Isidro Sobrino
- SaBio, Instituto de Investigación en Recursos Cinegéticos (IREC), Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Castilla-La Mancha (UCLM)-Junta de Comunidades de Castilla-La Mancha (JCCM), Ciudad Real, Spain
| | - José de la Fuente
- SaBio, Instituto de Investigación en Recursos Cinegéticos (IREC), Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Castilla-La Mancha (UCLM)-Junta de Comunidades de Castilla-La Mancha (JCCM), Ciudad Real, Spain
- Department of Veterinary Pathobiology, Center for Veterinary Health Sciences, OK State University, Stillwater, Oklahoma, USA
| |
Collapse
|
2
|
Oikonomou EK, Khera R. Designing medical artificial intelligence systems for global use: focus on interoperability, scalability, and accessibility. Hellenic J Cardiol 2024:S1109-9666(24)00158-1. [PMID: 39025234 DOI: 10.1016/j.hjc.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/21/2024] [Accepted: 07/02/2024] [Indexed: 07/20/2024] Open
Abstract
Advances in artificial intelligence (AI) and machine learning systems promise faster, more efficient and more personalized care. While many of these models are built on the premise of improving access to the timely screening, diagnosis, and treatment of cardiovascular disease, their validity and accessibility across diverse and international cohorts remains unknown. In this mini-review article, we summarize key obstacles in the effort to design AI systems that will be scalable, accessible, and accurate across distinct geographical and temporal settings. We discuss representativeness, interoperability, quality assurance and the importance of vendor-agnostic data types that will be immediately available to end-users across the globe. These topics illustrate how the timely integration of these principles into AI development is crucial to maximizing the global benefits of AI in cardiology.
Collapse
Affiliation(s)
- Evangelos K Oikonomou
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA.
| | - Rohan Khera
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA; Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT, USA; Section of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA; Section of Health Informatics, Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
3
|
Rathnam S, Hart KL, Sharma A, Verhaak PF, McCoy TH, Doshi-Velez F, Perlis RH. Heterogeneity in Antidepressant Treatment and Major Depressive Disorder Outcomes Among Clinicians. JAMA Psychiatry 2024:2821076. [PMID: 38985482 PMCID: PMC11238069 DOI: 10.1001/jamapsychiatry.2024.1778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/16/2024] [Indexed: 07/11/2024]
Abstract
Importance While abundant work has examined patient-level differences in antidepressant treatment outcomes, little is known about the extent of clinician-level differences. Understanding these differences may be important in the development of risk models, precision treatment strategies, and more efficient systems of care. Objective To characterize differences between outpatient clinicians in treatment selection and outcomes for their patients diagnosed with major depressive disorder across academic medical centers, community hospitals, and affiliated clinics. Design, Setting, and Participants This was a longitudinal cohort study using data derived from electronic health records at 2 large academic medical centers and 6 community hospitals, and their affiliated outpatient networks, in eastern Massachusetts. Participants were deidentified clinicians who billed at least 10 International Classification of Diseases, Ninth Revision (ICD-9) or Tenth Revision (ICD-10) diagnoses of major depressive disorder per year between 2008 and 2022. Data analysis occurred between September 2023 and January 2024. Main Outcomes and Measures Heterogeneity of prescribing, defined as the number of distinct antidepressants accounting for 75% of prescriptions by a given clinician; proportion of patients who did not return for follow-up after an index prescription; and proportion of patients receiving stable, ongoing antidepressant treatment. Results Among 11 934 clinicians treating major depressive disorder, unsupervised learning identified 10 distinct clusters on the basis of ICD codes, corresponding to outpatient psychiatry as well as oncology, obstetrics, and primary care. Between these clusters, substantial variability was identified in the proportion of selective serotonin reuptake inhibitors, selective norepinephrine reuptake inhibitors, and tricyclic antidepressants prescribed, as well as in the number of distinct antidepressants prescribed. Variability was also detected between clinician clusters in loss to follow-up and achievement of stable treatment, with the former ranging from 27% to 69% and the latter from 22% to 42%. Clinician clusters were significantly associated with treatment outcomes. Conclusions and Relevance Groups of clinicians treating individuals diagnosed with major depressive disorder exhibit marked differences in prescribing patterns as well as longitudinal patient outcomes defined by electronic health records. Incorporating these group identifiers yielded similar prediction to more complex models incorporating individual codes, suggesting the importance of considering treatment context in efforts at risk stratification.
Collapse
Affiliation(s)
- Sarah Rathnam
- Harvard John A. Paulson School of Engineering and Applied Sciences, Cambridge, Massachusetts
| | - Kamber L. Hart
- Center for Quantitative Health, Massachusetts General Hospital, Boston
- Department of Psychiatry, Harvard Medical School, Boston, Massachusetts
| | - Abhishek Sharma
- Harvard John A. Paulson School of Engineering and Applied Sciences, Cambridge, Massachusetts
| | - Pilar F. Verhaak
- Center for Quantitative Health, Massachusetts General Hospital, Boston
| | - Thomas H. McCoy
- Center for Quantitative Health, Massachusetts General Hospital, Boston
- Department of Psychiatry, Harvard Medical School, Boston, Massachusetts
| | - Finale Doshi-Velez
- Harvard John A. Paulson School of Engineering and Applied Sciences, Cambridge, Massachusetts
| | - Roy H. Perlis
- Center for Quantitative Health, Massachusetts General Hospital, Boston
- Department of Psychiatry, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
4
|
Chen Z, Liang N, Li H, Zhang H, Li H, Yan L, Hu Z, Chen Y, Zhang Y, Wang Y, Ke D, Shi N. Exploring explainable AI features in the vocal biomarkers of lung disease. Comput Biol Med 2024; 179:108844. [PMID: 38981214 DOI: 10.1016/j.compbiomed.2024.108844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 05/15/2024] [Accepted: 06/04/2024] [Indexed: 07/11/2024]
Abstract
This review delves into the burgeoning field of explainable artificial intelligence (XAI) in the detection and analysis of lung diseases through vocal biomarkers. Lung diseases, often elusive in their early stages, pose a significant public health challenge. Recent advancements in AI have ushered in innovative methods for early detection, yet the black-box nature of many AI models limits their clinical applicability. XAI emerges as a pivotal tool, enhancing transparency and interpretability in AI-driven diagnostics. This review synthesizes current research on the application of XAI in analyzing vocal biomarkers for lung diseases, highlighting how these techniques elucidate the connections between specific vocal features and lung pathology. We critically examine the methodologies employed, the types of lung diseases studied, and the performance of various XAI models. The potential for XAI to aid in early detection, monitor disease progression, and personalize treatment strategies in pulmonary medicine is emphasized. Furthermore, this review identifies current challenges, including data heterogeneity and model generalizability, and proposes future directions for research. By offering a comprehensive analysis of explainable AI features in the context of lung disease detection, this review aims to bridge the gap between advanced computational approaches and clinical practice, paving the way for more transparent, reliable, and effective diagnostic tools.
Collapse
Affiliation(s)
- Zhao Chen
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Ning Liang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Haoyuan Li
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Haili Zhang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Huizhen Li
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Lijiao Yan
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Ziteng Hu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yaxin Chen
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yujing Zhang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yanping Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Dandan Ke
- Special Disease Clinic, Huaishuling Branch of Beijing Fengtai Hospital of Integrated Traditional Chinese and Western Medicine, Beijing, China.
| | - Nannan Shi
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China; Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China.
| |
Collapse
|
5
|
Oettl FC, Pareek A, Winkler PW, Zsidai B, Pruneski JA, Senorski EH, Kopf S, Ley C, Herbst E, Oeding JF, Grassi A, Hirschmann MT, Musahl V, Samuelsson K, Tischer T, Feldt R. A practical guide to the implementation of AI in orthopaedic research, Part 6: How to evaluate the performance of AI research? J Exp Orthop 2024; 11:e12039. [PMID: 38826500 PMCID: PMC11141501 DOI: 10.1002/jeo2.12039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 03/13/2024] [Accepted: 04/19/2024] [Indexed: 06/04/2024] Open
Abstract
Artificial intelligence's (AI) accelerating progress demands rigorous evaluation standards to ensure safe, effective integration into healthcare's high-stakes decisions. As AI increasingly enables prediction, analysis and judgement capabilities relevant to medicine, proper evaluation and interpretation are indispensable. Erroneous AI could endanger patients; thus, developing, validating and deploying medical AI demands adhering to strict, transparent standards centred on safety, ethics and responsible oversight. Core considerations include assessing performance on diverse real-world data, collaborating with domain experts, confirming model reliability and limitations, and advancing interpretability. Thoughtful selection of evaluation metrics suited to the clinical context along with testing on diverse data sets representing different populations improves generalisability. Partnering software engineers, data scientists and medical practitioners ground assessment in real needs. Journals must uphold reporting standards matching AI's societal impacts. With rigorous, holistic evaluation frameworks, AI can progress towards expanding healthcare access and quality. Level of Evidence Level V.
Collapse
Affiliation(s)
- Felix C. Oettl
- Hospital for Special SurgeryNew YorkNew YorkUSA
- Schulthess KlinikZurichSwitzerland
| | - Ayoosh Pareek
- Sports Medicine and Shoulder Institute, Hospital for Special SurgeryNew YorkNew YorkUSA
| | - Philipp W. Winkler
- Department for Orthopaedics and Traumatology, Kepler University Hospital GmbHJohannes Kepler University LinzLinzAustria
- Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska AcademyUniversity of GothenburgGothenburgSweden
- Sahlgrenska Sports Medicine CenterGöteborgSweden
| | - Bálint Zsidai
- Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska AcademyUniversity of GothenburgGothenburgSweden
- Sahlgrenska Sports Medicine CenterGöteborgSweden
| | - James A. Pruneski
- Department of Orthopaedic SurgeryTripler Army Medical CenterHonoluluHawaiiUSA
| | - Eric Hamrin Senorski
- Sahlgrenska Sports Medicine CenterGöteborgSweden
- Department of Health and Rehabilitation, Institute of Neuroscience and Physiology, Sahlgrenska AcademyUniversity of GothenburgGothenburgSweden
| | - Sebastian Kopf
- Center of Orthopaedics and Traumatology, University Hospital Brandenburg an der Havel, Brandenburg Medical School Theodor FontaneGermany
| | - Christophe Ley
- Department of MathematicsUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | - Elmar Herbst
- Department of Trauma, Hand and Reconstructive SurgeryUniversity Hospital MuensterMuensterGermany
| | - Jacob F. Oeding
- Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska AcademyUniversity of GothenburgGothenburgSweden
- Mayo Clinic Alix School of Medicine, Mayo ClinicRochesterMinnesotaUSA
| | - Alberto Grassi
- IIa Clinica Ortopedica e Traumatologica, IRCCS Istituto Ortopedico RizzoliBolognaItaly
| | - Michael T. Hirschmann
- Department of Orthopaedic Surgery and TraumatologyKantonsspital BasellandBruderholzSwitzerland
- University of BaselBaselSwitzerland
| | - Volker Musahl
- Department of Orthopaedic Surgery, UPMC Freddie Fu Sports Medicine CenterUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Kristian Samuelsson
- Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska AcademyUniversity of GothenburgGothenburgSweden
- Sahlgrenska Sports Medicine CenterGöteborgSweden
- Department of OrthopaedicsSahlgrenska University HospitalMölndalSweden
| | - Thomas Tischer
- Department of Orthopaedic SurgeryUniversitymedicine RostockRostockGermany
- Department of Orthopaedic and Trauma SurgeryMalteser Waldkrankenhaus ErlangenErlangenGermany
| | - Robert Feldt
- Department of Computer Science and EngineeringChalmers University of TechnologyGothenburgSweden
| | | |
Collapse
|
6
|
Yang J, Triendl H, Soltan AAS, Prakash M, Clifton DA. Addressing label noise for electronic health records: insights from computer vision for tabular data. BMC Med Inform Decis Mak 2024; 24:183. [PMID: 38937744 PMCID: PMC11212446 DOI: 10.1186/s12911-024-02581-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 06/20/2024] [Indexed: 06/29/2024] Open
Abstract
The analysis of extensive electronic health records (EHR) datasets often calls for automated solutions, with machine learning (ML) techniques, including deep learning (DL), taking a lead role. One common task involves categorizing EHR data into predefined groups. However, the vulnerability of EHRs to noise and errors stemming from data collection processes, as well as potential human labeling errors, poses a significant risk. This risk is particularly prominent during the training of DL models, where the possibility of overfitting to noisy labels can have serious repercussions in healthcare. Despite the well-documented existence of label noise in EHR data, few studies have tackled this challenge within the EHR domain. Our work addresses this gap by adapting computer vision (CV) algorithms to mitigate the impact of label noise in DL models trained on EHR data. Notably, it remains uncertain whether CV methods, when applied to the EHR domain, will prove effective, given the substantial divergence between the two domains. We present empirical evidence demonstrating that these methods, whether used individually or in combination, can substantially enhance model performance when applied to EHR data, especially in the presence of noisy/incorrect labels. We validate our methods and underscore their practical utility in real-world EHR data, specifically in the context of COVID-19 diagnosis. Our study highlights the effectiveness of CV methods in the EHR domain, making a valuable contribution to the advancement of healthcare analytics and research.
Collapse
Affiliation(s)
- Jenny Yang
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England.
| | | | - Andrew A S Soltan
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England
- Oxford Cancer & Haematology Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, England
- Department of Oncology, University of Oxford, Oxford, England
| | - Mangal Prakash
- Work done at Exscientia, Currently Independent Researcher, Reading, United Kingdom
| | - David A Clifton
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England
- Oxford-Suzhou Centre for Advanced Research (OSCAR), Suzhou, China
| |
Collapse
|
7
|
Dos Santos L, Silva LL, Pelloso FC, Maia V, Pujals C, Borghesan DH, Carvalho MD, Pedroso RB, Pelloso SM. Use of machine learning to identify protective factors for death from COVID-19 in the ICU: a retrospective study. PeerJ 2024; 12:e17428. [PMID: 38881861 PMCID: PMC11179634 DOI: 10.7717/peerj.17428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 04/29/2024] [Indexed: 06/18/2024] Open
Abstract
Background Patients in serious condition due to COVID-19 often require special care in intensive care units (ICUs). This disease has affected over 758 million people and resulted in 6.8 million deaths worldwide. Additionally, the progression of the disease may vary from individual to individual, that is, it is essential to identify the clinical parameters that indicate a good prognosis for the patient. Machine learning (ML) algorithms have been used for analyzing complex medical data and identifying prognostic indicators. However, there is still an urgent need for a model to elucidate the predictors related to patient outcomes. Therefore, this research aimed to verify, through ML, the variables involved in the discharge of patients admitted to the ICU due to COVID-19. Methods In this study, 126 variables were collected with information on demography, hospital length stay and outcome, chronic diseases and tumors, comorbidities and risk factors, complications and adverse events, health care, and vital indicators of patients admitted to an ICU in southern Brazil. These variables were filtered and then selected by a ML algorithm known as decision trees to identify the optimal set of variables for predicting patient discharge using logistic regression. Finally, a confusion matrix was performed to evaluate the model's performance for the selected variables. Results Of the 532 patients evaluated, 180 were discharged: female (16.92%), with a central venous catheter (23.68%), with a bladder catheter (26.13%), and with an average of 8.46- and 23.65-days using bladder catheter and submitted to mechanical ventilation, respectively. In addition, the chances of discharge increase by 14% for each additional day in the hospital, by 136% for female patients, 716% when there is no bladder catheter, and 737% when no central venous catheter is used. However, the chances of discharge decrease by 3% for each additional year of age and by 9% for each other day of mechanical ventilation. The performance of the training data presented a balanced accuracy of 0.81, sensitivity of 0.74, specificity of 0.88, and the kappa value was 0.64. The test performance had a balanced accuracy of 0.85, sensitivity 0.75, specificity 0.95, and kappa value of 0.73. The McNemar test found that there were no significant differences in the error rates in the training and test data, suggesting good classification. This work showed that female, the absence of a central venous catheter and bladder catheter, shorter mechanical ventilation, and bladder catheter duration were associated with a greater chance of hospital discharge. These results may help develop measures that lead to a good prognosis for the patient.
Collapse
Affiliation(s)
- Lander Dos Santos
- State University of Maringá, Graduate Program in Health Sciences, Maringá, Paraná, Brazil
| | - Lincoln Luis Silva
- Department of Emergency Medicine, Duke University School of Medicine, Durham, NC, United States of America
| | | | | | - Constanza Pujals
- State University of Maringá, Graduate Program in Health Sciences, Maringá, Paraná, Brazil
| | | | - Maria Dalva Carvalho
- State University of Maringá, Graduate Program in Health Sciences, Maringá, Paraná, Brazil
| | - Raíssa Bocchi Pedroso
- State University of Maringá, Graduate Program in Health Sciences, Maringá, Paraná, Brazil
| | - Sandra Marisa Pelloso
- State University of Maringá, Graduate Program in Health Sciences, Maringá, Paraná, Brazil
| |
Collapse
|
8
|
Yang J, Clifton L, Dung NT, Phong NT, Yen LM, Thy DBX, Soltan AAS, Thwaites L, Clifton DA. Mitigating machine learning bias between high income and low-middle income countries for enhanced model fairness and generalizability. Sci Rep 2024; 14:13318. [PMID: 38858466 PMCID: PMC11164855 DOI: 10.1038/s41598-024-64210-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 06/06/2024] [Indexed: 06/12/2024] Open
Abstract
Collaborative efforts in artificial intelligence (AI) are increasingly common between high-income countries (HICs) and low- to middle-income countries (LMICs). Given the resource limitations often encountered by LMICs, collaboration becomes crucial for pooling resources, expertise, and knowledge. Despite the apparent advantages, ensuring the fairness and equity of these collaborative models is essential, especially considering the distinct differences between LMIC and HIC hospitals. In this study, we show that collaborative AI approaches can lead to divergent performance outcomes across HIC and LMIC settings, particularly in the presence of data imbalances. Through a real-world COVID-19 screening case study, we demonstrate that implementing algorithmic-level bias mitigation methods significantly improves outcome fairness between HIC and LMIC sites while maintaining high diagnostic sensitivity. We compare our results against previous benchmarks, utilizing datasets from four independent United Kingdom Hospitals and one Vietnamese hospital, representing HIC and LMIC settings, respectively.
Collapse
Affiliation(s)
- Jenny Yang
- Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, England.
| | - Lei Clifton
- Nuffield Department of Population Health, University of Oxford, Oxford, England
| | | | | | - Lam Minh Yen
- Oxford University Clinical Research Unit, Ho Chi Minh, Vietnam
| | | | - Andrew A S Soltan
- Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, England
- Nuffield Department of Population Health, University of Oxford, Oxford, England
- Oxford Cancer and Haematology Centre, Oxford University Hospitals NHS Foundation Trust, Ho Chi Minh, Vietnam
| | - Louise Thwaites
- Oxford University Clinical Research Unit, Ho Chi Minh, Vietnam
| | - David A Clifton
- Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, England
- Oxford-Suzhou Centre for Advanced Research (OSCAR), Suzhou, China
| |
Collapse
|
9
|
Seong D, Espinosa C, Aghaeepour N. Computational Approaches for Predicting Preterm Birth and Newborn Outcomes. Clin Perinatol 2024; 51:461-473. [PMID: 38705652 PMCID: PMC11070639 DOI: 10.1016/j.clp.2024.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Preterm birth (PTB) and its associated morbidities are a leading cause of infant mortality and morbidity. Accurate predictive models and a better biological understanding of PTB-associated morbidities are critical in reducing their adverse effects. Increasing availability of multimodal high-dimensional data sets with concurrent advances in artificial intelligence (AI) have created a rich opportunity to gain novel insights into PTB, a clinically complex and multifactorial disease. Here, the authors review the use of AI to analyze 3 modes of data: electronic health records, biological omics, and social determinants of health metrics.
Collapse
Affiliation(s)
- David Seong
- Immunology Program, Stanford University School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA; Medical Scientist Training Program, Stanford University School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA; Department of Microbiology and Immunology, Stanford University School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA; Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA
| | - Camilo Espinosa
- Immunology Program, Stanford University School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA; Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA; Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA; Department of Biomedical Data Science, Stanford University, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA; Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA; Department of Biomedical Data Science, Stanford University, 300 Pasteur Drive, Grant S280, Stanford, CA 94305-5117, USA.
| |
Collapse
|
10
|
Khosravi B, Li F, Dapamede T, Rouzrokh P, Gamble CU, Trivedi HM, Wyles CC, Sellergren AB, Purkayastha S, Erickson BJ, Gichoya JW. Synthetically enhanced: unveiling synthetic data's potential in medical imaging research. EBioMedicine 2024; 104:105174. [PMID: 38821021 PMCID: PMC11177083 DOI: 10.1016/j.ebiom.2024.105174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/13/2024] [Accepted: 05/15/2024] [Indexed: 06/02/2024] Open
Abstract
BACKGROUND Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research. METHODS The study employed DDPMs to create synthetic CXRs conditioned on demographic and pathological characteristics from the CheXpert dataset. These synthetic images were used to supplement training datasets for pathology classifiers, with the aim of improving their performance. The evaluation involved three datasets (CheXpert, MIMIC-CXR, and Emory Chest X-ray) and various experiments, including supplementing real data with synthetic data, training with purely synthetic data, and mixing synthetic data with external datasets. Performance was assessed using the area under the receiver operating curve (AUROC). FINDINGS Adding synthetic data to real datasets resulted in a notable increase in AUROC values (up to 0.02 in internal and external test sets with 1000% supplementation, p-value <0.01 in all instances). When classifiers were trained exclusively on synthetic data, they achieved performance levels comparable to those trained on real data with 200%-300% data supplementation. The combination of real and synthetic data from different sources demonstrated enhanced model generalizability, increasing model AUROC from 0.76 to 0.80 on the internal test set (p-value <0.01). INTERPRETATION Synthetic data supplementation significantly improves the performance and generalizability of pathology classifiers in medical imaging. FUNDING Dr. Gichoya is a 2022 Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program and declares support from RSNA Health Disparities grant (#EIHD2204), Lacuna Fund (#67), Gordon and Betty Moore Foundation, NIH (NIBIB) MIDRC grant under contracts 75N92020C00008 and 75N92020C00021, and NHLBI Award Number R01HL167811.
Collapse
Affiliation(s)
- Bardia Khosravi
- Department of Radiology, Mayo Clinic, Rochester, MN, USA; Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Frank Li
- Department of Radiology, Emory University, Atlanta, GA, USA
| | - Theo Dapamede
- Department of Radiology, Emory University, Atlanta, GA, USA
| | - Pouria Rouzrokh
- Department of Radiology, Mayo Clinic, Rochester, MN, USA; Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | | | - Hari M Trivedi
- Department of Radiology, Emory University, Atlanta, GA, USA
| | - Cody C Wyles
- Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | | | - Saptarshi Purkayastha
- School of Informatics and Computing, Indiana University-Purdue University, Indianapolis, IN, USA
| | | | - Judy W Gichoya
- Department of Radiology, Emory University, Atlanta, GA, USA.
| |
Collapse
|
11
|
Wang J, Jin Y, Jiang A, Chen W, Shan G, Gu Y, Ming Y, Li J, Yue C, Huang Z, Librach C, Lin G, Wang X, Zhao H, Sun Y, Zhang Z. Testing the generalizability and effectiveness of deep learning models among clinics: sperm detection as a pilot study. Reprod Biol Endocrinol 2024; 22:59. [PMID: 38778327 PMCID: PMC11110326 DOI: 10.1186/s12958-024-01232-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024] Open
Abstract
BACKGROUND Deep learning has been increasingly investigated for assisting clinical in vitro fertilization (IVF). The first technical step in many tasks is to visually detect and locate sperm, oocytes, and embryos in images. For clinical deployment of such deep learning models, different clinics use different image acquisition hardware and different sample preprocessing protocols, raising the concern over whether the reported accuracy of a deep learning model by one clinic could be reproduced in another clinic. Here we aim to investigate the effect of each imaging factor on the generalizability of object detection models, using sperm analysis as a pilot example. METHODS Ablation studies were performed using state-of-the-art models for detecting human sperm to quantitatively assess how model precision (false-positive detection) and recall (missed detection) were affected by imaging magnification, imaging mode, and sample preprocessing protocols. The results led to the hypothesis that the richness of image acquisition conditions in a training dataset deterministically affects model generalizability. The hypothesis was tested by first enriching the training dataset with a wide range of imaging conditions, then validated through internal blind tests on new samples and external multi-center clinical validations. RESULTS Ablation experiments revealed that removing subsets of data from the training dataset significantly reduced model precision. Removing raw sample images from the training dataset caused the largest drop in model precision, whereas removing 20x images caused the largest drop in model recall. by incorporating different imaging and sample preprocessing conditions into a rich training dataset, the model achieved an intraclass correlation coefficient (ICC) of 0.97 (95% CI: 0.94-0.99) for precision, and an ICC of 0.97 (95% CI: 0.93-0.99) for recall. Multi-center clinical validation showed no significant differences in model precision or recall across different clinics and applications. CONCLUSIONS The results validated the hypothesis that the richness of data in the training dataset is a key factor impacting model generalizability. These findings highlight the importance of diversity in a training dataset for model evaluation and suggest that future deep learning models in andrology and reproductive medicine should incorporate comprehensive feature sets for enhanced generalizability across clinics.
Collapse
Affiliation(s)
- Jiaqi Wang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China
| | - Yufei Jin
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China
| | - Aojun Jiang
- Department of Mechanical Engineering, University of Toronto, Toronto, Canada
| | - Wenyuan Chen
- Department of Mechanical Engineering, University of Toronto, Toronto, Canada
| | - Guanqiao Shan
- Department of Mechanical Engineering, University of Toronto, Toronto, Canada
| | - Yifan Gu
- Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
- Reproductive & Genetic Hospital of Citic-Xiangya, Changsha, China
| | - Yue Ming
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, China
| | - Jichang Li
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, China
| | - Chunfeng Yue
- Suzhou Boundless Medical Technology Ltd., Co., Suzhou, China
| | - Zongjie Huang
- Suzhou Boundless Medical Technology Ltd., Co., Suzhou, China
| | | | - Ge Lin
- Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
- Reproductive & Genetic Hospital of Citic-Xiangya, Changsha, China
| | - Xibu Wang
- The 3rd Affiliated Hospital of Shenzhen University, Shenzhen, China
| | - Huan Zhao
- The 3rd Affiliated Hospital of Shenzhen University, Shenzhen, China.
| | - Yu Sun
- Department of Mechanical Engineering, University of Toronto, Toronto, Canada.
- Department of Computer Science, University of Toronto, Toronto, Canada.
- Institute of Biomedical Engineering, University of Toronto, Toronto, Canada.
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.
| | - Zhuoran Zhang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China.
| |
Collapse
|
12
|
Sacca L, Lobaina D, Burgoa S, Lotharius K, Moothedan E, Gilmore N, Xie J, Mohler R, Scharf G, Knecht M, Kitsantas P. Promoting Artificial Intelligence for Global Breast Cancer Risk Prediction and Screening in Adult Women: A Scoping Review. J Clin Med 2024; 13:2525. [PMID: 38731054 PMCID: PMC11084581 DOI: 10.3390/jcm13092525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/01/2024] [Accepted: 04/23/2024] [Indexed: 05/13/2024] Open
Abstract
Background: Artificial intelligence (AI) algorithms can be applied in breast cancer risk prediction and prevention by using patient history, scans, imaging information, and analysis of specific genes for cancer classification to reduce overdiagnosis and overtreatment. This scoping review aimed to identify the barriers encountered in applying innovative AI techniques and models in developing breast cancer risk prediction scores and promoting screening behaviors among adult females. Findings may inform and guide future global recommendations for AI application in breast cancer prevention and care for female populations. Methods: The PRISMA-SCR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) was used as a reference checklist throughout this study. The Arksey and O'Malley methodology was used as a framework to guide this review. The framework methodology consisted of five steps: (1) Identify research questions; (2) Search for relevant studies; (3) Selection of studies relevant to the research questions; (4) Chart the data; (5) Collate, summarize, and report the results. Results: In the field of breast cancer risk detection and prevention, the following AI techniques and models have been applied: Machine and Deep Learning Model (ML-DL model) (n = 1), Academic Algorithms (n = 2), Breast Cancer Surveillance Consortium (BCSC), Clinical 5-Year Risk Prediction Model (n = 2), deep-learning computer vision AI algorithms (n = 2), AI-based thermal imaging solution (Thermalytix) (n = 1), RealRisks (n = 2), Breast Cancer Risk NAVIgation (n = 1), MammoRisk (ML-Based Tool) (n = 1), Various MLModels (n = 1), and various machine/deep learning, decision aids, and commercial algorithms (n = 7). In the 11 included studies, a total of 39 barriers to AI applications in breast cancer risk prediction and screening efforts were identified. The most common barriers in the application of innovative AI tools for breast cancer prediction and improved screening rates included lack of external validity and limited generalizability (n = 6), as AI was used in studies with either a small sample size or datasets with missing data. Many studies (n = 5) also encountered selection bias due to exclusion of certain populations based on characteristics such as race/ethnicity, family history, or past medical history. Several recommendations for future research should be considered. AI models need to include a broader spectrum and more complete predictive variables for risk assessment. Investigating long-term outcomes with improved follow-up periods is critical to assess the impacts of AI on clinical decisions beyond just the immediate outcomes. Utilizing AI to improve communication strategies at both a local and organizational level can assist in informed decision-making and compliance, especially in populations with limited literacy levels. Conclusions: The use of AI in patient education and as an adjunctive tool for providers is still early in its incorporation, and future research should explore the implementation of AI-driven resources to enhance understanding and decision-making regarding breast cancer screening, especially in vulnerable populations with limited literacy.
Collapse
Affiliation(s)
- Lea Sacca
- Charles E. Schmidt College of Medicine, Florida Atlantic University, Boca Raton, FL 33431, USA; (D.L.); (S.B.); (K.L.); (E.M.); (N.G.); (J.X.); (R.M.); (G.S.); (M.K.); (P.K.)
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Carriero A, Groenhoff L, Vologina E, Basile P, Albera M. Deep Learning in Breast Cancer Imaging: State of the Art and Recent Advancements in Early 2024. Diagnostics (Basel) 2024; 14:848. [PMID: 38667493 PMCID: PMC11048882 DOI: 10.3390/diagnostics14080848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/07/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
The rapid advancement of artificial intelligence (AI) has significantly impacted various aspects of healthcare, particularly in the medical imaging field. This review focuses on recent developments in the application of deep learning (DL) techniques to breast cancer imaging. DL models, a subset of AI algorithms inspired by human brain architecture, have demonstrated remarkable success in analyzing complex medical images, enhancing diagnostic precision, and streamlining workflows. DL models have been applied to breast cancer diagnosis via mammography, ultrasonography, and magnetic resonance imaging. Furthermore, DL-based radiomic approaches may play a role in breast cancer risk assessment, prognosis prediction, and therapeutic response monitoring. Nevertheless, several challenges have limited the widespread adoption of AI techniques in clinical practice, emphasizing the importance of rigorous validation, interpretability, and technical considerations when implementing DL solutions. By examining fundamental concepts in DL techniques applied to medical imaging and synthesizing the latest advancements and trends, this narrative review aims to provide valuable and up-to-date insights for radiologists seeking to harness the power of AI in breast cancer care.
Collapse
Affiliation(s)
| | - Léon Groenhoff
- Radiology Department, Maggiore della Carità Hospital, 28100 Novara, Italy; (A.C.); (E.V.); (P.B.); (M.A.)
| | | | | | | |
Collapse
|
14
|
Naderalvojoud B, Curtin CM, Yanover C, El-Hay T, Choi B, Park RW, Tabuenca JG, Reeve MP, Falconer T, Humphreys K, Asch SM, Hernandez-Boussard T. Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network. J Am Med Inform Assoc 2024; 31:1051-1061. [PMID: 38412331 PMCID: PMC11031239 DOI: 10.1093/jamia/ocae028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/26/2024] [Accepted: 02/01/2024] [Indexed: 02/29/2024] Open
Abstract
BACKGROUND Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with restricted feature use from the original training data, lacking insights into their suitability at external sites. Our study introduces an innovative methodology for evaluating features during both the development phase and the validation, focusing on creating and validating predictive models for post-surgery patient outcomes with improved generalizability. METHODS Electronic health records (EHRs) from 4 countries (United States, United Kingdom, Finland, and Korea) were mapped to the OMOP Common Data Model (CDM), 2008-2019. Machine learning (ML) models were developed to predict post-surgery prolonged opioid use (POU) risks using data collected 6 months before surgery. Both local and cross-site feature selection methods were applied in the development and external validation datasets. Models were developed using Observational Health Data Sciences and Informatics (OHDSI) tools and validated on separate patient cohorts. RESULTS Model development included 41 929 patients, 14.6% with POU. The external validation included 31 932 (UK), 23 100 (US), 7295 (Korea), and 3934 (Finland) patients with POU of 44.2%, 22.0%, 15.8%, and 21.8%, respectively. The top-performing model, Lasso logistic regression, achieved an area under the receiver operating characteristic curve (AUROC) of 0.75 during local validation and 0.69 (SD = 0.02) (averaged) in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site through external validation (P < .05). CONCLUSIONS Using EHRs across four countries mapped to the OMOP CDM, we developed generalizable predictive models for POU. Our approach demonstrates the significant impact of cross-site feature selection in improving model performance, underscoring the importance of incorporating diverse feature sets from various clinical settings to enhance the generalizability and utility of predictive healthcare models.
Collapse
Affiliation(s)
| | - Catherine M Curtin
- Department of Surgery, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304, United States
| | - Chen Yanover
- KI Research Institute, Kfar Malal, 4592000, Israel
| | - Tal El-Hay
- KI Research Institute, Kfar Malal, 4592000, Israel
| | - Byungjin Choi
- Department of Biomedical Informatics, Ajou University Graduate School of Medicine, Suwon, 16499, Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University Graduate School of Medicine, Suwon, 16499, Korea
| | - Javier Gracia Tabuenca
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
| | - Mary Pat Reeve
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
| | - Thomas Falconer
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | - Keith Humphreys
- Department of Psychiatry and the Behavioral Sciences, Stanford University, Stanford, CA 94305, United States
- Center for Innovation to Implementation, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304, United States
| | - Steven M Asch
- Department of Medicine, Stanford University, Stanford, CA 94305, United States
- Center for Innovation to Implementation, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304, United States
| | | |
Collapse
|
15
|
Maleki Varnosfaderani S, Forouzanfar M. The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century. Bioengineering (Basel) 2024; 11:337. [PMID: 38671759 PMCID: PMC11047988 DOI: 10.3390/bioengineering11040337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
As healthcare systems around the world face challenges such as escalating costs, limited access, and growing demand for personalized care, artificial intelligence (AI) is emerging as a key force for transformation. This review is motivated by the urgent need to harness AI's potential to mitigate these issues and aims to critically assess AI's integration in different healthcare domains. We explore how AI empowers clinical decision-making, optimizes hospital operation and management, refines medical image analysis, and revolutionizes patient care and monitoring through AI-powered wearables. Through several case studies, we review how AI has transformed specific healthcare domains and discuss the remaining challenges and possible solutions. Additionally, we will discuss methodologies for assessing AI healthcare solutions, ethical challenges of AI deployment, and the importance of data privacy and bias mitigation for responsible technology use. By presenting a critical assessment of AI's transformative potential, this review equips researchers with a deeper understanding of AI's current and future impact on healthcare. It encourages an interdisciplinary dialogue between researchers, clinicians, and technologists to navigate the complexities of AI implementation, fostering the development of AI-driven solutions that prioritize ethical standards, equity, and a patient-centered approach.
Collapse
Affiliation(s)
| | - Mohamad Forouzanfar
- Département de Génie des Systèmes, École de Technologie Supérieure (ÉTS), Université du Québec, Montréal, QC H3C 1K3, Canada
- Centre de Recherche de L’institut Universitaire de Gériatrie de Montréal (CRIUGM), Montréal, QC H3W 1W5, Canada
| |
Collapse
|
16
|
Rajendran S, Pan W, Sabuncu MR, Chen Y, Zhou J, Wang F. Learning across diverse biomedical data modalities and cohorts: Challenges and opportunities for innovation. PATTERNS (NEW YORK, N.Y.) 2024; 5:100913. [PMID: 38370129 PMCID: PMC10873158 DOI: 10.1016/j.patter.2023.100913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
In healthcare, machine learning (ML) shows significant potential to augment patient care, improve population health, and streamline healthcare workflows. Realizing its full potential is, however, often hampered by concerns about data privacy, diversity in data sources, and suboptimal utilization of different data modalities. This review studies the utility of cross-cohort cross-category (C4) integration in such contexts: the process of combining information from diverse datasets distributed across distinct, secure sites. We argue that C4 approaches could pave the way for ML models that are both holistic and widely applicable. This paper provides a comprehensive overview of C4 in health care, including its present stage, potential opportunities, and associated challenges.
Collapse
Affiliation(s)
- Suraj Rajendran
- Tri-Institutional Computational Biology & Medicine Program, Cornell University, Ithaca, NY, USA
| | - Weishen Pan
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Mert R. Sabuncu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA
- Cornell Tech, Cornell University, New York, NY, USA
- Department of Radiology, Weill Cornell Medical School, New York, NY, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jiayu Zhou
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Fei Wang
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
17
|
Cunningham JW, Singh P, Reeder C, Claggett B, Marti-Castellote PM, Lau ES, Khurshid S, Batra P, Lubitz SA, Maddah M, Philippakis A, Desai AS, Ellinor PT, Vardeny O, Solomon SD, Ho JE. Natural Language Processing for Adjudication of Heart Failure in a Multicenter Clinical Trial: A Secondary Analysis of a Randomized Clinical Trial. JAMA Cardiol 2024; 9:174-181. [PMID: 37950744 PMCID: PMC10640703 DOI: 10.1001/jamacardio.2023.4859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 10/29/2023] [Indexed: 11/13/2023]
Abstract
Importance The gold standard for outcome adjudication in clinical trials is medical record review by a physician clinical events committee (CEC), which requires substantial time and expertise. Automated adjudication of medical records by natural language processing (NLP) may offer a more resource-efficient alternative but this approach has not been validated in a multicenter setting. Objective To externally validate the Community Care Cohort Project (C3PO) NLP model for heart failure (HF) hospitalization adjudication, which was previously developed and tested within one health care system, compared to gold-standard CEC adjudication in a multicenter clinical trial. Design, Setting, and Participants This was a retrospective analysis of the Influenza Vaccine to Effectively Stop Cardio Thoracic Events and Decompensated Heart Failure (INVESTED) trial, which compared 2 influenza vaccines in 5260 participants with cardiovascular disease at 157 sites in the US and Canada between September 2016 and January 2019. Analysis was performed from November 2022 to October 2023. Exposures Individual sites submitted medical records for each hospitalization. The central INVESTED CEC and the C3PO NLP model independently adjudicated whether the cause of hospitalization was HF using the prepared hospitalization dossier. The C3PO NLP model was fine-tuned (C3PO + INVESTED) and a de novo NLP model was trained using half the INVESTED hospitalizations. Main Outcomes and Measures Concordance between the C3PO NLP model HF adjudication and the gold-standard INVESTED CEC adjudication was measured by raw agreement, κ, sensitivity, and specificity. The fine-tuned and de novo INVESTED NLP models were evaluated in an internal validation cohort not used for training. Results Among 4060 hospitalizations in 1973 patients (mean [SD] age, 66.4 [13.2] years; 514 [27.4%] female and 1432 [72.6%] male]), 1074 hospitalizations (26%) were adjudicated as HF by the CEC. There was good agreement between the C3PO NLP and CEC HF adjudications (raw agreement, 87% [95% CI, 86-88]; κ, 0.69 [95% CI, 0.66-0.72]). C3PO NLP model sensitivity was 94% (95% CI, 92-95) and specificity was 84% (95% CI, 83-85). The fine-tuned C3PO and de novo NLP models demonstrated agreement of 93% (95% CI, 92-94) and κ of 0.82 (95% CI, 0.77-0.86) and 0.83 (95% CI, 0.79-0.87), respectively, vs the CEC. CEC reviewer interrater reproducibility was 94% (95% CI, 93-95; κ, 0.85 [95% CI, 0.80-0.89]). Conclusions and Relevance The C3PO NLP model developed within 1 health care system identified HF events with good agreement relative to the gold-standard CEC in an external multicenter clinical trial. Fine-tuning the model improved agreement and approximated human reproducibility. Further study is needed to determine whether NLP will improve the efficiency of future multicenter clinical trials by identifying clinical events at scale.
Collapse
Affiliation(s)
- Jonathan W. Cunningham
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge
| | - Pulkit Singh
- Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge
| | - Christopher Reeder
- Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge
| | - Brian Claggett
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
| | | | - Emily S. Lau
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge
- Division of Cardiology, Massachusetts General Hospital, Boston
| | - Shaan Khurshid
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston
| | - Puneet Batra
- Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge
| | - Steven A. Lubitz
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston
| | - Mahnaz Maddah
- Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge
| | - Anthony Philippakis
- Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge
| | - Akshay S. Desai
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
| | - Patrick T. Ellinor
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston
| | - Orly Vardeny
- Minneapolis VA Hospital, University of Minnesota, Minneapolis
| | - Scott D. Solomon
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
| | - Jennifer E. Ho
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge
- CardioVascular Institute and Division of Cardiology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| |
Collapse
|
18
|
Pan W, Xu Z, Rajendran S, Wang F. An adaptive federated learning framework for clinical risk prediction with electronic health records from multiple hospitals. PATTERNS (NEW YORK, N.Y.) 2024; 5:100898. [PMID: 38264713 PMCID: PMC10801228 DOI: 10.1016/j.patter.2023.100898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 09/06/2023] [Accepted: 11/21/2023] [Indexed: 01/25/2024]
Abstract
Clinical risk prediction with electronic health records (EHR) using machine learning has attracted lots of attentions in recent years, where one of the key challenges is how to protect data privacy. Federated learning (FL) provides a promising framework for building predictive models by leveraging the data from multiple institutions without sharing them. However, data distribution drift across different institutions greatly impacts the performance of FL. In this paper, an adaptive FL framework was proposed to address this challenge. Our framework separated the input features into stable, domain-specific, and conditional-irrelevant parts according to their relationships to clinical outcomes. We evaluate this framework on the tasks of predicting the onset risk of sepsis and acute kidney injury (AKI) for patients in the intensive care unit (ICU) from multiple clinical institutions. The results showed that our framework can achieve better prediction performance compared with existing FL baselines and provide reasonable feature interpretations.
Collapse
Affiliation(s)
- Weishen Pan
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Zhenxing Xu
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Suraj Rajendran
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| |
Collapse
|
19
|
Derbal Y. Adaptive Cancer Therapy in the Age of Generative Artificial Intelligence. Cancer Control 2024; 31:10732748241264704. [PMID: 38897721 PMCID: PMC11189021 DOI: 10.1177/10732748241264704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/17/2024] [Accepted: 06/06/2024] [Indexed: 06/21/2024] Open
Abstract
Therapeutic resistance is a major challenge facing the design of effective cancer treatments. Adaptive cancer therapy is in principle the most viable approach to manage cancer's adaptive dynamics through drug combinations with dose timing and modulation. However, there are numerous open issues facing the clinical success of adaptive therapy. Chief among these issues is the feasibility of real-time predictions of treatment response which represent a bedrock requirement of adaptive therapy. Generative artificial intelligence has the potential to learn prediction models of treatment response from clinical, molecular, and radiomics data about patients and their treatments. The article explores this potential through a proposed integration model of Generative Pre-Trained Transformers (GPTs) in a closed loop with adaptive treatments to predict the trajectories of disease progression. The conceptual model and the challenges facing its realization are discussed in the broader context of artificial intelligence integration in oncology.
Collapse
Affiliation(s)
- Youcef Derbal
- Ted Rogers School of Information Technology Management, Toronto Metropolitan University, Toronto, ON, Canada
| |
Collapse
|
20
|
Zapf M. Invited Commentary: Can We Predict Intraoperative Transfusion Nationwide Using a Single Algorithm? J Am Coll Surg 2024; 238:105-106. [PMID: 37782025 DOI: 10.1097/xcs.0000000000000882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
|
21
|
Jaotombo F, Adorni L, Ghattas B, Boyer L. Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data. PLoS One 2023; 18:e0289795. [PMID: 38032876 PMCID: PMC10688642 DOI: 10.1371/journal.pone.0289795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 07/25/2023] [Indexed: 12/02/2023] Open
Abstract
OBJECTIVE This study aims to develop high-performing Machine Learning and Deep Learning models in predicting hospital length of stay (LOS) while enhancing interpretability. We compare performance and interpretability of models trained only on structured tabular data with models trained only on unstructured clinical text data, and on mixed data. METHODS The structured data was used to train fourteen classical Machine Learning models including advanced ensemble trees, neural networks and k-nearest neighbors. The unstructured data was used to fine-tune a pre-trained Bio Clinical BERT Transformer Deep Learning model. The structured and unstructured data were then merged into a tabular dataset after vectorization of the clinical text and a dimensional reduction through Latent Dirichlet Allocation. The study used the free and publicly available Medical Information Mart for Intensive Care (MIMIC) III database, on the open AutoML Library AutoGluon. Performance is evaluated with respect to two types of random classifiers, used as baselines. RESULTS The best model from structured data demonstrates high performance (ROC AUC = 0.944, PRC AUC = 0.655) with limited interpretability, where the most important predictors of prolonged LOS are the level of blood urea nitrogen and of platelets. The Transformer model displays a good but lower performance (ROC AUC = 0.842, PRC AUC = 0.375) with a richer array of interpretability by providing more specific in-hospital factors including procedures, conditions, and medical history. The best model trained on mixed data satisfies both a high level of performance (ROC AUC = 0.963, PRC AUC = 0.746) and a much larger scope in interpretability including pathologies of the intestine, the colon, and the blood; infectious diseases, respiratory problems, procedures involving sedation and intubation, and vascular surgery. CONCLUSIONS Our results outperform most of the state-of-the-art models in LOS prediction both in terms of performance and of interpretability. Data fusion between structured and unstructured text data may significantly improve performance and interpretability.
Collapse
Affiliation(s)
- Franck Jaotombo
- EMLYON Business School, Ecully, France
- Research Centre on Health Services and Quality of Life, Aix Marseille University, Marseille, France
| | - Luca Adorni
- Becker Friedman Institute, Chicago, IL, United States of America
| | - Badih Ghattas
- Aix Marseille University, CNRS, AMSE, Marseille, France
| | - Laurent Boyer
- Research Centre on Health Services and Quality of Life, Aix Marseille University, Marseille, France
- Department of Public Health, Assistance Publique–Hopitaux de Marseille, Marseille, France
| |
Collapse
|
22
|
Yang J, El-Bouri R, O’Donoghue O, Lachapelle AS, Soltan AAS, Eyre DW, Lu L, Clifton DA. Deep reinforcement learning for multi-class imbalanced training: applications in healthcare. Mach Learn 2023; 113:2655-2674. [PMID: 38708086 PMCID: PMC11065699 DOI: 10.1007/s10994-023-06481-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 08/15/2023] [Accepted: 10/17/2023] [Indexed: 05/07/2024]
Abstract
With the rapid growth of memory and computing power, datasets are becoming increasingly complex and imbalanced. This is especially severe in the context of clinical data, where there may be one rare event for many cases in the majority class. We introduce an imbalanced classification framework, based on reinforcement learning, for training extremely imbalanced data sets, and extend it for use in multi-class settings. We combine dueling and double deep Q-learning architectures, and formulate a custom reward function and episode-training procedure, specifically with the capability of handling multi-class imbalanced training. Using real-world clinical case studies, we demonstrate that our proposed framework outperforms current state-of-the-art imbalanced learning methods, achieving more fair and balanced classification, while also significantly improving the prediction of minority classes. Supplementary Information The online version contains supplementary material available at 10.1007/s10994-023-06481-z.
Collapse
Affiliation(s)
- Jenny Yang
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England
| | - Rasheed El-Bouri
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England
| | - Odhran O’Donoghue
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England
| | - Alexander S. Lachapelle
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England
| | - Andrew A. S. Soltan
- Oxford Cancer & Haematology Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, England
- RDM Division of Cardiovascular Medicine, University of Oxford, Oxford, England
- London Medical Imaging and AI Centre for Value Based Healthcare, Guy’s and St Thomas’ NHS Foundation Trust, London, England
| | - David W. Eyre
- Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, England
| | - Lei Lu
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England
| | - David A. Clifton
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England
- Oxford-Suzhou Centre for Advanced Research (OSCAR), Suzhou, China
| |
Collapse
|
23
|
Nickson D, Meyer C, Walasek L, Toro C. Prediction and diagnosis of depression using machine learning with electronic health records data: a systematic review. BMC Med Inform Decis Mak 2023; 23:271. [PMID: 38012655 PMCID: PMC10680172 DOI: 10.1186/s12911-023-02341-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 10/15/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Depression is one of the most significant health conditions in personal, social, and economic impact. The aim of this review is to summarize existing literature in which machine learning methods have been used in combination with Electronic Health Records for prediction of depression. METHODS Systematic literature searches were conducted within arXiv, PubMed, PsycINFO, Science Direct, SCOPUS and Web of Science electronic databases. Searches were restricted to information published after 2010 (from 1st January 2011 onwards) and were updated prior to the final synthesis of data (27th January 2022). RESULTS Following the PRISMA process, the initial 744 studies were reduced to 19 eligible for detailed evaluation. Data extraction identified machine learning methods used, types of predictors used, the definition of depression, classification performance achieved, sample size, and benchmarks used. Area Under the Curve (AUC) values more than 0.9 were claimed, though the average was around 0.8. Regression methods proved as effective as more developed machine learning techniques. LIMITATIONS The categorization, definition, and identification of the numbers of predictors used within models was sometimes difficult to establish, Studies were largely Western Educated Industrialised, Rich, Democratic (WEIRD) in demography. CONCLUSION This review supports the potential use of machine learning techniques with Electronic Health Records for the prediction of depression. All the selected studies used clinically based, though sometimes broad, definitions of depression as their classification criteria. The reported performance of the studies was comparable to or even better than that found in primary care. There are concerns with generalizability and interpretability.
Collapse
Affiliation(s)
| | - Caroline Meyer
- Warwick Medical School, University of Warwick, Coventry, UK
| | - Lukasz Walasek
- Department of Psychology, University of Warwick, Coventry, UK
| | - Carla Toro
- Warwick Medical School, University of Warwick, Coventry, UK
| |
Collapse
|
24
|
Yang HS, Pan W, Wang Y, Zaydman MA, Spies NC, Zhao Z, Guise TA, Meng QH, Wang F. Generalizability of a Machine Learning Model for Improving Utilization of Parathyroid Hormone-Related Peptide Testing across Multiple Clinical Centers. Clin Chem 2023; 69:1260-1269. [PMID: 37738611 DOI: 10.1093/clinchem/hvad141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/23/2023] [Indexed: 09/24/2023]
Abstract
BACKGROUND Measuring parathyroid hormone-related peptide (PTHrP) helps diagnose the humoral hypercalcemia of malignancy, but is often ordered for patients with low pretest probability, resulting in poor test utilization. Manual review of results to identify inappropriate PTHrP orders is a cumbersome process. METHODS Using a dataset of 1330 patients from a single institute, we developed a machine learning (ML) model to predict abnormal PTHrP results. We then evaluated the performance of the model on two external datasets. Different strategies (model transporting, retraining, rebuilding, and fine-tuning) were investigated to improve model generalizability. Maximum mean discrepancy (MMD) was adopted to quantify the shift of data distributions across different datasets. RESULTS The model achieved an area under the receiver operating characteristic curve (AUROC) of 0.936, and a specificity of 0.842 at 0.900 sensitivity in the development cohort. Directly transporting this model to two external datasets resulted in a deterioration of AUROC to 0.838 and 0.737, with the latter having a larger MMD corresponding to a greater data shift compared to the original dataset. Model rebuilding using site-specific data improved AUROC to 0.891 and 0.837 on the two sites, respectively. When external data is insufficient for retraining, a fine-tuning strategy also improved model utility. CONCLUSIONS ML offers promise to improve PTHrP test utilization while relieving the burden of manual review. Transporting a ready-made model to external datasets may lead to performance deterioration due to data distribution shift. Model retraining or rebuilding could improve generalizability when there are enough data, and model fine-tuning may be favorable when site-specific data is limited.
Collapse
Affiliation(s)
- He S Yang
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Weishen Pan
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States
| | - Yingheng Wang
- Department of Computer Science, Cornell University, Ithaca, NY, United States
| | - Mark A Zaydman
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, United States
| | - Nicholas C Spies
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, United States
| | - Zhen Zhao
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Theresa A Guise
- Department of Endocrine Neoplasia and Hormonal Disorders, Division of Internal Medicine, The University of Texas, MD Anderson, Houston, TX, United States
| | - Qing H Meng
- Department of Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States
| |
Collapse
|
25
|
Yang J, Eyre DW, Lu L, Clifton DA. Interpretable machine learning-based decision support for prediction of antibiotic resistance for complicated urinary tract infections. NPJ ANTIMICROBIALS AND RESISTANCE 2023; 1:14. [PMID: 38686216 PMCID: PMC11057209 DOI: 10.1038/s44259-023-00015-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 10/04/2023] [Indexed: 05/02/2024]
Abstract
Urinary tract infections are one of the most common bacterial infections worldwide; however, increasing antimicrobial resistance in bacterial pathogens is making it challenging for clinicians to correctly prescribe patients appropriate antibiotics. In this study, we present four interpretable machine learning-based decision support algorithms for predicting antimicrobial resistance. Using electronic health record data from a large cohort of patients diagnosed with potentially complicated UTIs, we demonstrate high predictability of antibiotic resistance across four antibiotics - nitrofurantoin, co-trimoxazole, ciprofloxacin, and levofloxacin. We additionally demonstrate the generalizability of our methods on a separate cohort of patients with uncomplicated UTIs, demonstrating that machine learning-driven approaches can help alleviate the potential of administering non-susceptible treatments, facilitate rapid effective clinical interventions, and enable personalized treatment suggestions. Additionally, these techniques present the benefit of providing model interpretability, explaining the basis for generated predictions.
Collapse
Affiliation(s)
- Jenny Yang
- Institute of Biomedical Engineering, Department Engineering Science, University of Oxford, Oxford, UK
| | - David W. Eyre
- Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Lei Lu
- Institute of Biomedical Engineering, Department Engineering Science, University of Oxford, Oxford, UK
| | - David A. Clifton
- Institute of Biomedical Engineering, Department Engineering Science, University of Oxford, Oxford, UK
- Oxford-Suzhou Centre for Advanced Research (OSCAR), Suzhou, China
| |
Collapse
|
26
|
Li C, Liu W, Zhu Z, Wang X, Zhang Y. Quantization of extraoral free flap monitoring for venous congestion with deep learning integrated iOS applications on smartphones. Int J Surg 2023; 109:3679-3680. [PMID: 37462988 PMCID: PMC10651233 DOI: 10.1097/js9.0000000000000626] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 07/09/2023] [Indexed: 11/17/2023]
Affiliation(s)
- Chunyan Li
- Department of Clinical Laboratory
- Department of Laboratory Medicine, Beijing Jishuitan Hospital, Capital Medical University, Fourth Clinical College of Peking University, Beijing
| | - Wei Liu
- Department of Clinical Laboratory
- Department of Laboratory Medicine, Beijing Jishuitan Hospital, Capital Medical University, Fourth Clinical College of Peking University, Beijing
| | - Zhenglin Zhu
- Department of Orthopaedic Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing
| | - Xing Wang
- Department of Spinal Surgery, Shihezi General Hospital of the Eighth Division, Shihezi
| | - Yanbin Zhang
- Department of Spine Surgery, Beijing Jishuitan Hospital, Capital Medical University, Fourth Clinical College of Peking University, National Center for Orthopaedics, Beijing, People’s Republic of China
| |
Collapse
|
27
|
Shenouda M, Flerlage I, Kaveti A, Giger ML, Armato SG. Assessment of a deep learning model for COVID-19 classification on chest radiographs: a comparison across image acquisition techniques and clinical factors. J Med Imaging (Bellingham) 2023; 10:064504. [PMID: 38162317 PMCID: PMC10753846 DOI: 10.1117/1.jmi.10.6.064504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/30/2023] [Accepted: 12/06/2023] [Indexed: 01/03/2024] Open
Abstract
Purpose The purpose is to assess the performance of a pre-trained deep learning model in the task of classifying between coronavirus disease (COVID)-positive and COVID-negative patients from chest radiographs (CXRs) while considering various image acquisition parameters, clinical factors, and patient demographics. Methods Standard and soft-tissue CXRs of 9860 patients comprised the "original dataset," consisting of training and test sets and were used to train a DenseNet-121 architecture model to classify COVID-19 using three classification algorithms: standard, soft tissue, and a combination of both types of images via feature fusion. A larger more-current test set of 5893 patients (the "current test set") was used to assess the performance of the pretrained model. The current test set contained a larger span of dates, incorporated different variants of the virus and included different immunization statuses. Model performance between the original and current test sets was evaluated using area under the receiver operating characteristic curve (ROC AUC) [95% CI]. Results The model achieved AUC values of 0.67 [0.65, 0.70] for cropped standard images, 0.65 [0.63, 0.67] for cropped soft-tissue images, and 0.67 [0.65, 0.69] for both types of cropped images. These were all significantly lower than the performance of the model on the original test set. Investigations regarding matching the acquisition dates between the test sets (i.e., controlling for virus variants), immunization status, disease severity, and age and sex distributions did not fully explain the discrepancy in performance. Conclusions Several relevant factors were considered to determine whether differences existed in the test sets, including time period of image acquisition, vaccination status, and disease severity. The lower performance on the current test set may have occurred due to model overfitting and a lack of generalizability.
Collapse
Affiliation(s)
- Mena Shenouda
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, Illinois, United States
| | | | - Aditi Kaveti
- Stony Brook University, Stony Brook, New York, United States
| | - Maryellen L. Giger
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, Illinois, United States
| | - Samuel G. Armato
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, Illinois, United States
| |
Collapse
|
28
|
Monteith S, Glenn T, Geddes JR, Achtyes ED, Whybrow PC, Bauer M. Challenges and Ethical Considerations to Successfully Implement Artificial Intelligence in Clinical Medicine and Neuroscience: a Narrative Review. PHARMACOPSYCHIATRY 2023; 56:209-213. [PMID: 37643732 DOI: 10.1055/a-2142-9325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
This narrative review discusses how the safe and effective use of clinical artificial intelligence (AI) prediction tools requires recognition of the importance of human intelligence. Human intelligence, creativity, situational awareness, and professional knowledge, are required for successful implementation. The implementation of clinical AI prediction tools may change the workflow in medical practice resulting in new challenges and safety implications. Human understanding of how a clinical AI prediction tool performs in routine and exceptional situations is fundamental to successful implementation. Physicians must be involved in all aspects of the selection, implementation, and ongoing product monitoring of clinical AI prediction tools.
Collapse
Affiliation(s)
- Scott Monteith
- Department of Psychiatry, Michigan State University College of Human Medicine, Traverse City Campus, Traverse City, MI, USA
| | - Tasha Glenn
- ChronoRecord Association, Fullerton, CA, USA
| | - John R Geddes
- Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford, UK
| | - Eric D Achtyes
- Department of Psychiatry, Western Michigan University Homer Stryker M.D. School of Medicine, Kalamazoo, MI, USA
| | - Peter C Whybrow
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles (UCLA), Los Angeles, CA, USA
| | - Michael Bauer
- Department of Psychiatry and Psychotherapy, University Hospital Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
29
|
Cunningham JW, Singh P, Reeder C, Claggett B, Marti-Castellote PM, Lau ES, Khurshid S, Batra P, Lubitz SA, Maddah M, Philippakis A, Desai AS, Ellinor PT, Vardeny O, Solomon SD, Ho JE. Natural Language Processing for Adjudication of Heart Failure Hospitalizations in a Multi-Center Clinical Trial. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.17.23294234. [PMID: 37662283 PMCID: PMC10473787 DOI: 10.1101/2023.08.17.23294234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Background The gold standard for outcome adjudication in clinical trials is chart review by a physician clinical events committee (CEC), which requires substantial time and expertise. Automated adjudication by natural language processing (NLP) may offer a more resource-efficient alternative. We previously showed that the Community Care Cohort Project (C3PO) NLP model adjudicates heart failure (HF) hospitalizations accurately within one healthcare system. Methods This study externally validated the C3PO NLP model against CEC adjudication in the INVESTED trial. INVESTED compared influenza vaccination formulations in 5260 patients with cardiovascular disease at 157 North American sites. A central CEC adjudicated the cause of hospitalizations from medical records. We applied the C3PO NLP model to medical records from 4060 INVESTED hospitalizations and evaluated agreement between the NLP and final consensus CEC HF adjudications. We then fine-tuned the C3PO NLP model (C3PO+INVESTED) and trained a de novo model using half the INVESTED hospitalizations, and evaluated these models in the other half. NLP performance was benchmarked to CEC reviewer inter-rater reproducibility. Results 1074 hospitalizations (26%) were adjudicated as HF by the CEC. There was high agreement between the C3PO NLP and CEC HF adjudications (agreement 87%, kappa statistic 0.69). C3PO NLP model sensitivity was 94% and specificity was 84%. The fine-tuned C3PO and de novo NLP models demonstrated agreement of 93% and kappa of 0.82 and 0.83, respectively. CEC reviewer inter-rater reproducibility was 94% (kappa 0.85). Conclusion Our NLP model developed within a single healthcare system accurately identified HF events relative to the gold-standard CEC in an external multi-center clinical trial. Fine-tuning the model improved agreement and approximated human reproducibility. NLP may improve the efficiency of future multi-center clinical trials by accurately identifying clinical events at scale.
Collapse
Affiliation(s)
- Jonathan W. Cunningham
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Pulkit Singh
- Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Christopher Reeder
- Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Brian Claggett
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
| | | | - Emily S. Lau
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge, Massachusetts
- Division of Cardiology, Massachusetts General Hospital, Boston, Massachusetts
| | - Shaan Khurshid
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge, Massachusetts
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, Massachusetts
| | - Puneet Batra
- Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Steven A. Lubitz
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge, Massachusetts
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, Massachusetts
| | - Mahnaz Maddah
- Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Anthony Philippakis
- Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Akshay S. Desai
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
| | - Patrick T. Ellinor
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge, Massachusetts
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, Massachusetts
| | - Orly Vardeny
- Minneapolis VA Hospital, University of Minnesota, Minneapolis, Minnesota
| | - Scott D. Solomon
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
| | - Jennifer E. Ho
- Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge, Massachusetts
- CardioVascular Institute and Division of Cardiology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| |
Collapse
|
30
|
Yang J, Soltan AAS, Eyre DW, Clifton DA. Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning. NAT MACH INTELL 2023; 5:884-894. [PMID: 37615031 PMCID: PMC10442224 DOI: 10.1038/s42256-023-00697-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 06/27/2023] [Indexed: 08/25/2023]
Abstract
As models based on machine learning continue to be developed for healthcare applications, greater effort is needed to ensure that these technologies do not reflect or exacerbate any unwanted or discriminatory biases that may be present in the data. Here we introduce a reinforcement learning framework capable of mitigating biases that may have been acquired during data collection. In particular, we evaluated our model for the task of rapidly predicting COVID-19 for patients presenting to hospital emergency departments and aimed to mitigate any site (hospital)-specific and ethnicity-based biases present in the data. Using a specialized reward function and training procedure, we show that our method achieves clinically effective screening performances, while significantly improving outcome fairness compared with current benchmarks and state-of-the-art machine learning methods. We performed external validation across three independent hospitals, and additionally tested our method on a patient intensive care unit discharge status task, demonstrating model generalizability.
Collapse
Affiliation(s)
- Jenny Yang
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Andrew A. S. Soltan
- John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
- RDM Division of Cardiovascular Medicine, University of Oxford, Oxford, UK
| | - David W. Eyre
- Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - David A. Clifton
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
- Oxford-Suzhou Centre for Advanced Research, Suzhou, China
| |
Collapse
|
31
|
ZhuParris A, de Goede AA, Yocarini IE, Kraaij W, Groeneveld GJ, Doll RJ. Machine Learning Techniques for Developing Remotely Monitored Central Nervous System Biomarkers Using Wearable Sensors: A Narrative Literature Review. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23115243. [PMID: 37299969 DOI: 10.3390/s23115243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 05/23/2023] [Accepted: 05/26/2023] [Indexed: 06/12/2023]
Abstract
BACKGROUND Central nervous system (CNS) disorders benefit from ongoing monitoring to assess disease progression and treatment efficacy. Mobile health (mHealth) technologies offer a means for the remote and continuous symptom monitoring of patients. Machine Learning (ML) techniques can process and engineer mHealth data into a precise and multidimensional biomarker of disease activity. OBJECTIVE This narrative literature review aims to provide an overview of the current landscape of biomarker development using mHealth technologies and ML. Additionally, it proposes recommendations to ensure the accuracy, reliability, and interpretability of these biomarkers. METHODS This review extracted relevant publications from databases such as PubMed, IEEE, and CTTI. The ML methods employed across the selected publications were then extracted, aggregated, and reviewed. RESULTS This review synthesized and presented the diverse approaches of 66 publications that address creating mHealth-based biomarkers using ML. The reviewed publications provide a foundation for effective biomarker development and offer recommendations for creating representative, reproducible, and interpretable biomarkers for future clinical trials. CONCLUSION mHealth-based and ML-derived biomarkers have great potential for the remote monitoring of CNS disorders. However, further research and standardization of study designs are needed to advance this field. With continued innovation, mHealth-based biomarkers hold promise for improving the monitoring of CNS disorders.
Collapse
Affiliation(s)
- Ahnjili ZhuParris
- Centre for Human Drug Research (CHDR), Zernikedreef 8, 2333 CL Leiden, The Netherlands
- Leiden Institute of Advanced Computer Science (LIACS), Snellius Gebouw, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
- Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA Leiden, The Netherlands
| | - Annika A de Goede
- Centre for Human Drug Research (CHDR), Zernikedreef 8, 2333 CL Leiden, The Netherlands
| | - Iris E Yocarini
- Leiden Institute of Advanced Computer Science (LIACS), Snellius Gebouw, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
| | - Wessel Kraaij
- Leiden Institute of Advanced Computer Science (LIACS), Snellius Gebouw, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
- The Netherlands Organisation for Applied Scientific Research (TNO), Anna van Buerenplein 1, 2595 DA, Den Haag, The Netherlands
| | - Geert Jan Groeneveld
- Centre for Human Drug Research (CHDR), Zernikedreef 8, 2333 CL Leiden, The Netherlands
- Leiden Institute of Advanced Computer Science (LIACS), Snellius Gebouw, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
| | - Robert Jan Doll
- Centre for Human Drug Research (CHDR), Zernikedreef 8, 2333 CL Leiden, The Netherlands
| |
Collapse
|
32
|
Yang J, Soltan AAS, Eyre DW, Yang Y, Clifton DA. An adversarial training framework for mitigating algorithmic biases in clinical machine learning. NPJ Digit Med 2023; 6:55. [PMID: 36991077 PMCID: PMC10050816 DOI: 10.1038/s41746-023-00805-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 03/13/2023] [Indexed: 03/31/2023] Open
Abstract
Machine learning is becoming increasingly prominent in healthcare. Although its benefits are clear, growing attention is being given to how these tools may exacerbate existing biases and disparities. In this study, we introduce an adversarial training framework that is capable of mitigating biases that may have been acquired through data collection. We demonstrate this proposed framework on the real-world task of rapidly predicting COVID-19, and focus on mitigating site-specific (hospital) and demographic (ethnicity) biases. Using the statistical definition of equalized odds, we show that adversarial training improves outcome fairness, while still achieving clinically-effective screening performances (negative predictive values >0.98). We compare our method to previous benchmarks, and perform prospective and external validation across four independent hospital cohorts. Our method can be generalized to any outcomes, models, and definitions of fairness.
Collapse
Affiliation(s)
- Jenny Yang
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, England.
| | - Andrew A S Soltan
- John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, England
- RDM Division of Cardiovascular Medicine, University of Oxford, Oxford, England
| | - David W Eyre
- Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, England
| | - Yang Yang
- School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - David A Clifton
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, England
- Oxford-Suzhou Centre for Advanced Research (OSCAR), Suzhou, China
| |
Collapse
|
33
|
Abedi V, Kawamura Y, Li J, Phan TG, Zand R. Editorial: Machine Learning in Action: Stroke Diagnosis and Outcome Prediction. Front Neurol 2022; 13:984467. [PMID: 35937051 PMCID: PMC9346061 DOI: 10.3389/fneur.2022.984467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 07/05/2022] [Indexed: 11/13/2022] Open
Affiliation(s)
- Vida Abedi
- Department of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA, United States
- *Correspondence: Vida Abedi
| | - Yuki Kawamura
- Department of Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Jiang Li
- Department of Molecular and Functional Genomics, Weis Center for Research, Geisinger Health System, Danville, PA, United States
| | - Thanh G. Phan
- Stroke and Aging Research Group, Clinical Trials, Imaging and Informatics Division, School of Clinical Sciences at Monash Health, Melbourne, VIC, Australia
- Department of Neurology, Monash Health, Melbourne, VIC, Australia
| | - Ramin Zand
- Department of Neurology, College of Medicine, The Pennsylvania State University, Hershey, PA, United States
| |
Collapse
|