1
|
Gonzalez R, Saha A, Campbell CJ, Nejat P, Lokker C, Norgan AP. Seeing the random forest through the decision trees. Supporting learning health systems from histopathology with machine learning models: Challenges and opportunities. J Pathol Inform 2024; 15:100347. [PMID: 38162950 PMCID: PMC10755052 DOI: 10.1016/j.jpi.2023.100347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/06/2023] [Accepted: 11/01/2023] [Indexed: 01/03/2024] Open
Abstract
This paper discusses some overlooked challenges faced when working with machine learning models for histopathology and presents a novel opportunity to support "Learning Health Systems" with them. Initially, the authors elaborate on these challenges after separating them according to their mitigation strategies: those that need innovative approaches, time, or future technological capabilities and those that require a conceptual reappraisal from a critical perspective. Then, a novel opportunity to support "Learning Health Systems" by integrating hidden information extracted by ML models from digitalized histopathology slides with other healthcare big data is presented.
Collapse
Affiliation(s)
- Ricardo Gonzalez
- DeGroote School of Business, McMaster University, Hamilton, Ontario, Canada
- Division of Computational Pathology and Artificial Intelligence, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| | - Ashirbani Saha
- Department of Oncology, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
- Escarpment Cancer Research Institute, McMaster University and Hamilton Health Sciences, Hamilton, Ontario, Canada
| | - Clinton J.V. Campbell
- William Osler Health System, Brampton, Ontario, Canada
- Department of Pathology and Molecular Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
| | - Peyman Nejat
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Cynthia Lokker
- Health Information Research Unit, Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Andrew P. Norgan
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
2
|
Straw I, Rees G, Nachev P. Sex-Based Performance Disparities in Machine Learning Algorithms for Cardiac Disease Prediction: Exploratory Study. J Med Internet Res 2024; 26:e46936. [PMID: 39186324 PMCID: PMC11384168 DOI: 10.2196/46936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 10/13/2023] [Accepted: 05/04/2024] [Indexed: 08/27/2024] Open
Abstract
BACKGROUND The presence of bias in artificial intelligence has garnered increased attention, with inequities in algorithmic performance being exposed across the fields of criminal justice, education, and welfare services. In health care, the inequitable performance of algorithms across demographic groups may widen health inequalities. OBJECTIVE Here, we identify and characterize bias in cardiology algorithms, looking specifically at algorithms used in the management of heart failure. METHODS Stage 1 involved a literature search of PubMed and Web of Science for key terms relating to cardiac machine learning (ML) algorithms. Papers that built ML models to predict cardiac disease were evaluated for their focus on demographic bias in model performance, and open-source data sets were retained for our investigation. Two open-source data sets were identified: (1) the University of California Irvine Heart Failure data set and (2) the University of California Irvine Coronary Artery Disease data set. We reproduced existing algorithms that have been reported for these data sets, tested them for sex biases in algorithm performance, and assessed a range of remediation techniques for their efficacy in reducing inequities. Particular attention was paid to the false negative rate (FNR), due to the clinical significance of underdiagnosis and missed opportunities for treatment. RESULTS In stage 1, our literature search returned 127 papers, with 60 meeting the criteria for a full review and only 3 papers highlighting sex differences in algorithm performance. In the papers that reported sex, there was a consistent underrepresentation of female patients in the data sets. No papers investigated racial or ethnic differences. In stage 2, we reproduced algorithms reported in the literature, achieving mean accuracies of 84.24% (SD 3.51%) for data set 1 and 85.72% (SD 1.75%) for data set 2 (random forest models). For data set 1, the FNR was significantly higher for female patients in 13 out of 16 experiments, meeting the threshold of statistical significance (-17.81% to -3.37%; P<.05). A smaller disparity in the false positive rate was significant for male patients in 13 out of 16 experiments (-0.48% to +9.77%; P<.05). We observed an overprediction of disease for male patients (higher false positive rate) and an underprediction of disease for female patients (higher FNR). Sex differences in feature importance suggest that feature selection needs to be demographically tailored. CONCLUSIONS Our research exposes a significant gap in cardiac ML research, highlighting that the underperformance of algorithms for female patients has been overlooked in the published literature. Our study quantifies sex disparities in algorithmic performance and explores several sources of bias. We found an underrepresentation of female patients in the data sets used to train algorithms, identified sex biases in model error rates, and demonstrated that a series of remediation techniques were unable to address the inequities present.
Collapse
Affiliation(s)
- Isabel Straw
- University College London, London, United Kingdom
| | - Geraint Rees
- University College London, London, United Kingdom
| | | |
Collapse
|
3
|
Wang Y, Wang L, Zhou Z, Laurentiev J, Lakin JR, Zhou L, Hong P. Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases. J Biomed Inform 2024; 156:104677. [PMID: 38876453 PMCID: PMC11272432 DOI: 10.1016/j.jbi.2024.104677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 04/30/2024] [Accepted: 06/11/2024] [Indexed: 06/16/2024]
Abstract
OBJECTIVE Existing approaches to fairness evaluation often overlook systematic differences in the social determinants of health, like demographics and socioeconomics, among comparison groups, potentially leading to inaccurate or even contradictory conclusions. This study aims to evaluate racial disparities in predicting mortality among patients with chronic diseases using a fairness detection method that considers systematic differences. METHODS We created five datasets from Mass General Brigham's electronic health records (EHR), each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each dataset, we developed separate machine learning models to predict 1-year mortality and examined racial disparities by comparing prediction performances between Black and White individuals. We compared racial fairness evaluation between the overall Black and White individuals versus their counterparts who were Black and matched White individuals identified by propensity score matching, where the systematic differences were mitigated. RESULTS We identified significant differences between Black and White individuals in age, gender, marital status, education level, smoking status, health insurance type, body mass index, and Charlson comorbidity index (p-value < 0.001). When examining matched Black and White subpopulations identified through propensity score matching, significant differences between particular covariates existed. We observed weaker significance levels in the CHF cohort for insurance type (p = 0.043), in the CKD cohort for insurance type (p = 0.005) and education level (p = 0.016), and in the dementia cohort for body mass index (p = 0.041); with no significant differences for other covariates. When examining mortality prediction models across the five study cohorts, we conducted a comparison of fairness evaluations before and after mitigating systematic differences. We revealed significant differences in the CHF cohort with p-values of 0.021 and 0.001 in terms of F1 measure and Sensitivity for the AdaBoost model, and p-values of 0.014 and 0.003 in terms of F1 measure and Sensitivity for the MLP model, respectively. DISCUSSION AND CONCLUSION This study contributes to research on fairness assessment by focusing on the examination of systematic disparities and underscores the potential for revealing racial bias in machine learning models used in clinical settings.
Collapse
Affiliation(s)
| | - Liqin Wang
- Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | | | | | - Joshua R Lakin
- Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Li Zhou
- Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | | |
Collapse
|
4
|
Chen F, Wang L, Hong J, Jiang J, Zhou L. Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record-based models. ARXIV 2024:arXiv:2310.19917v3. [PMID: 39010875 PMCID: PMC11247915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Objectives Leveraging artificial intelligence (AI) in conjunction with electronic health records (EHRs) holds transformative potential to improve healthcare. However, addressing bias in AI, which risks worsening healthcare disparities, cannot be overlooked. This study reviews methods to handle various biases in AI models developed using EHR data. Materials and Methods We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-analyses guidelines, analyzing articles from PubMed, Web of Science, and IEEE published between January 01, 2010 and December 17, 2023. The review identified key biases, outlined strategies for detecting and mitigating bias throughout the AI model development, and analyzed metrics for bias assessment. Results Of the 450 articles retrieved, 20 met our criteria, revealing 6 major bias types: algorithmic, confounding, implicit, measurement, selection, and temporal. The AI models were primarily developed for predictive tasks, yet none have been deployed in real-world healthcare settings. Five studies concentrated on the detection of implicit and algorithmic biases employing fairness metrics like statistical parity, equal opportunity, and predictive equity. Fifteen studies proposed strategies for mitigating biases, especially targeting implicit and selection biases. These strategies, evaluated through both performance and fairness metrics, predominantly involved data collection and preprocessing techniques like resampling and reweighting. Discussion This review highlights evolving strategies to mitigate bias in EHR-based AI models, emphasizing the urgent need for both standardized and detailed reporting of the methodologies and systematic real-world testing and evaluation. Such measures are essential for gauging models' practical impact and fostering ethical AI that ensures fairness and equity in healthcare.
Collapse
Affiliation(s)
- Feng Chen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
- Department of Biomedical Informatics and Health Education, University of Washington, Seattle, WA 98105, United States
| | - Liqin Wang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA 02115, United States
| | - Julie Hong
- Wellesley High School, Wellesley, MA 02481, United States
| | - Jiaqi Jiang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Li Zhou
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA 02115, United States
| |
Collapse
|
5
|
Defilippo A, Veltri P, Lió P, Guzzi PH. Leveraging graph neural networks for supporting automatic triage of patients. Sci Rep 2024; 14:12548. [PMID: 38822012 PMCID: PMC11143315 DOI: 10.1038/s41598-024-63376-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/28/2024] [Indexed: 06/02/2024] Open
Abstract
Patient triage is crucial in emergency departments, ensuring timely and appropriate care based on correctly evaluating the emergency grade of patient conditions. Triage methods are generally performed by human operator based on her own experience and information that are gathered from the patient management process. Thus, it is a process that can generate errors in emergency-level associations. Recently, Traditional triage methods heavily rely on human decisions, which can be subjective and prone to errors. A growing interest has recently been focused on leveraging artificial intelligence (AI) to develop algorithms to maximize information gathering and minimize errors in patient triage processing. We define and implement an AI-based module to manage patients' emergency code assignments in emergency departments. It uses historical data from the emergency department to train the medical decision-making process. Data containing relevant patient information, such as vital signs, symptoms, and medical history, accurately classify patients into triage categories. Experimental results demonstrate that the proposed algorithm achieved high accuracy outperforming traditional triage methods. By using the proposed method, we claim that healthcare professionals can predict severity index to guide patient management processing and resource allocation.
Collapse
Affiliation(s)
- Annamaria Defilippo
- Dept. Medical and Surgical Sciences, Magna Graecia University of Catanzaro, Catanzaro, Italy
| | - Pierangelo Veltri
- DIMES Department of Informatics, Modeling, Electronics and Systems, UNICAL, Rende, Cosenza, Italy
| | - Pietro Lió
- Department of Computer Science and Technology, Cambridge University, Cambridge, UK
| | - Pietro Hiram Guzzi
- Dept. Medical and Surgical Sciences, Magna Graecia University of Catanzaro, Catanzaro, Italy.
| |
Collapse
|
6
|
Rotenstein L, Wang L, Zupanc SN, Penumarthy A, Laurentiev J, Lamey J, Farah S, Lipsitz S, Jain N, Bates DW, Zhou L, Lakin JR. Looking Beyond Mortality Prediction: Primary Care Physician Views of Patients' Palliative Care Needs Predicted by a Machine Learning Tool. Appl Clin Inform 2024; 15:460-468. [PMID: 38636542 PMCID: PMC11168809 DOI: 10.1055/a-2309-1599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 04/17/2024] [Indexed: 04/20/2024] Open
Abstract
OBJECTIVES To assess primary care physicians' (PCPs) perception of the need for serious illness conversations (SIC) or other palliative care interventions in patients flagged by a machine learning tool for high 1-year mortality risk. METHODS We surveyed PCPs from four Brigham and Women's Hospital primary care practice sites. Multiple mortality prediction algorithms were ensembled to assess adult patients of these PCPs who were either enrolled in the hospital's integrated care management program or had one of several chronic conditions. The patients were classified as high or low risk of 1-year mortality. A blinded survey had PCPs evaluate these patients for palliative care needs. We measured PCP and machine learning tool agreement regarding patients' need for an SIC/elevated risk of mortality. RESULTS Of 66 PCPs, 20 (30.3%) participated in the survey. Out of 312 patients evaluated, 60.6% were female, with a mean (standard deviation [SD]) age of 69.3 (17.5) years, and a mean (SD) Charlson Comorbidity Index of 2.80 (2.89). The machine learning tool identified 162 (51.9%) patients as high risk. Excluding deceased or unfamiliar patients, PCPs felt that an SIC was appropriate for 179 patients; the machine learning tool flagged 123 of these patients as high risk (68.7% concordance). For 105 patients whom PCPs deemed SIC unnecessary, the tool classified 83 as low risk (79.1% concordance). There was substantial agreement between PCPs and the tool (Gwet's agreement coefficient of 0.640). CONCLUSIONS A machine learning mortality prediction tool offers promise as a clinical decision aid, helping clinicians pinpoint patients needing palliative care interventions.
Collapse
Affiliation(s)
- Lisa Rotenstein
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
- School of Medicine, University of California, San Francisco, San Francisco, California, United States
| | - Liqin Wang
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
- Harvard Medical School, Boston, Massachusetts, United States
| | - Sophia N. Zupanc
- School of Medicine, University of California, San Francisco, San Francisco, California, United States
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, United States
| | - Akhila Penumarthy
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, United States
| | - John Laurentiev
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Jan Lamey
- Brigham and Women's Physician Organization, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Subrina Farah
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States
| | - Stuart Lipsitz
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
- Harvard Medical School, Boston, Massachusetts, United States
| | - Nina Jain
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
- Harvard Medical School, Boston, Massachusetts, United States
| | - David W. Bates
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
- Harvard Medical School, Boston, Massachusetts, United States
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
- Harvard Medical School, Boston, Massachusetts, United States
| | - Joshua R. Lakin
- Harvard Medical School, Boston, Massachusetts, United States
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, United States
- Division of Palliative Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
| |
Collapse
|
7
|
Chen F, Wang L, Hong J, Jiang J, Zhou L. Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record-based models. J Am Med Inform Assoc 2024; 31:1172-1183. [PMID: 38520723 PMCID: PMC11031231 DOI: 10.1093/jamia/ocae060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/26/2024] [Accepted: 03/05/2024] [Indexed: 03/25/2024] Open
Abstract
OBJECTIVES Leveraging artificial intelligence (AI) in conjunction with electronic health records (EHRs) holds transformative potential to improve healthcare. However, addressing bias in AI, which risks worsening healthcare disparities, cannot be overlooked. This study reviews methods to handle various biases in AI models developed using EHR data. MATERIALS AND METHODS We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-analyses guidelines, analyzing articles from PubMed, Web of Science, and IEEE published between January 01, 2010 and December 17, 2023. The review identified key biases, outlined strategies for detecting and mitigating bias throughout the AI model development, and analyzed metrics for bias assessment. RESULTS Of the 450 articles retrieved, 20 met our criteria, revealing 6 major bias types: algorithmic, confounding, implicit, measurement, selection, and temporal. The AI models were primarily developed for predictive tasks, yet none have been deployed in real-world healthcare settings. Five studies concentrated on the detection of implicit and algorithmic biases employing fairness metrics like statistical parity, equal opportunity, and predictive equity. Fifteen studies proposed strategies for mitigating biases, especially targeting implicit and selection biases. These strategies, evaluated through both performance and fairness metrics, predominantly involved data collection and preprocessing techniques like resampling and reweighting. DISCUSSION This review highlights evolving strategies to mitigate bias in EHR-based AI models, emphasizing the urgent need for both standardized and detailed reporting of the methodologies and systematic real-world testing and evaluation. Such measures are essential for gauging models' practical impact and fostering ethical AI that ensures fairness and equity in healthcare.
Collapse
Affiliation(s)
- Feng Chen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
- Department of Biomedical Informatics and Health Education, University of Washington, Seattle, WA 98105, United States
| | - Liqin Wang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA 02115, United States
| | - Julie Hong
- Wellesley High School, Wellesley, MA 02481, United States
| | - Jiaqi Jiang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Li Zhou
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA 02115, United States
| |
Collapse
|
8
|
Wang HE, Weiner JP, Saria S, Kharrazi H. Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis. J Med Internet Res 2024; 26:e47125. [PMID: 38422347 PMCID: PMC11066744 DOI: 10.2196/47125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 12/28/2023] [Accepted: 02/27/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND The adoption of predictive algorithms in health care comes with the potential for algorithmic bias, which could exacerbate existing disparities. Fairness metrics have been proposed to measure algorithmic bias, but their application to real-world tasks is limited. OBJECTIVE This study aims to evaluate the algorithmic bias associated with the application of common 30-day hospital readmission models and assess the usefulness and interpretability of selected fairness metrics. METHODS We used 10.6 million adult inpatient discharges from Maryland and Florida from 2016 to 2019 in this retrospective study. Models predicting 30-day hospital readmissions were evaluated: LACE Index, modified HOSPITAL score, and modified Centers for Medicare & Medicaid Services (CMS) readmission measure, which were applied as-is (using existing coefficients) and retrained (recalibrated with 50% of the data). Predictive performances and bias measures were evaluated for all, between Black and White populations, and between low- and other-income groups. Bias measures included the parity of false negative rate (FNR), false positive rate (FPR), 0-1 loss, and generalized entropy index. Racial bias represented by FNR and FPR differences was stratified to explore shifts in algorithmic bias in different populations. RESULTS The retrained CMS model demonstrated the best predictive performance (area under the curve: 0.74 in Maryland and 0.68-0.70 in Florida), and the modified HOSPITAL score demonstrated the best calibration (Brier score: 0.16-0.19 in Maryland and 0.19-0.21 in Florida). Calibration was better in White (compared to Black) populations and other-income (compared to low-income) groups, and the area under the curve was higher or similar in the Black (compared to White) populations. The retrained CMS and modified HOSPITAL score had the lowest racial and income bias in Maryland. In Florida, both of these models overall had the lowest income bias and the modified HOSPITAL score showed the lowest racial bias. In both states, the White and higher-income populations showed a higher FNR, while the Black and low-income populations resulted in a higher FPR and a higher 0-1 loss. When stratified by hospital and population composition, these models demonstrated heterogeneous algorithmic bias in different contexts and populations. CONCLUSIONS Caution must be taken when interpreting fairness measures' face value. A higher FNR or FPR could potentially reflect missed opportunities or wasted resources, but these measures could also reflect health care use patterns and gaps in care. Simply relying on the statistical notions of bias could obscure or underplay the causes of health disparity. The imperfect health data, analytic frameworks, and the underlying health systems must be carefully considered. Fairness measures can serve as a useful routine assessment to detect disparate model performances but are insufficient to inform mechanisms or policy changes. However, such an assessment is an important first step toward data-driven improvement to address existing health disparities.
Collapse
Affiliation(s)
- H Echo Wang
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States
| | - Jonathan P Weiner
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States
- Johns Hopkins Center for Population Health Information Technology, Baltimore, MD, United States
| | - Suchi Saria
- Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Hadi Kharrazi
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States
- Johns Hopkins Center for Population Health Information Technology, Baltimore, MD, United States
| |
Collapse
|
9
|
Srinivasan Y, Liu A, Rameau A. Machine learning in the evaluation of voice and swallowing in the head and neck cancer patient. Curr Opin Otolaryngol Head Neck Surg 2024; 32:105-112. [PMID: 38116798 DOI: 10.1097/moo.0000000000000948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
PURPOSE OF REVIEW The purpose of this review is to present recent advances and limitations in machine learning applied to the evaluation of speech, voice, and swallowing in head and neck cancer. RECENT FINDINGS Novel machine learning models incorporating diverse data modalities with improved discriminatory capabilities have been developed for predicting toxicities following head and neck cancer therapy, including dysphagia, dysphonia, xerostomia, and weight loss as well as guiding treatment planning. Machine learning has been applied to the care of posttreatment voice and swallowing dysfunction by offering objective and standardized assessments and aiding innovative technologies for functional restoration. Voice and speech are also being utilized in machine learning algorithms to screen laryngeal cancer. SUMMARY Machine learning has the potential to help optimize, assess, predict, and rehabilitate voice and swallowing function in head and neck cancer patients as well as aid in cancer screening. However, existing studies are limited by the lack of sufficient external validation and generalizability, insufficient transparency and reproducibility, and no clear superior predictive modeling strategies. Algorithms and applications will need to be trained on large multiinstitutional data sets, incorporate sociodemographic data to reduce bias, and achieve validation through clinical trials for optimal performance and utility.
Collapse
Affiliation(s)
- Yashes Srinivasan
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York
| | - Amy Liu
- University of California, San Diego, School of Medicine, San Diego, California, USA
| | - Anaïs Rameau
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York
| |
Collapse
|
10
|
Yang P, Gregory IA, Robichaux C, Holder AL, Martin GS, Esper AM, Kamaleswaran R, Gichoya JW, Bhavani SV. Racial Differences in Accuracy of Predictive Models for High-Flow Nasal Cannula Failure in COVID-19. Crit Care Explor 2024; 6:e1059. [PMID: 38975567 PMCID: PMC11224893 DOI: 10.1097/cce.0000000000001059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/09/2024] Open
Abstract
OBJECTIVES To develop and validate machine learning (ML) models to predict high-flow nasal cannula (HFNC) failure in COVID-19, compare their performance to the respiratory rate-oxygenation (ROX) index, and evaluate model accuracy by self-reported race. DESIGN Retrospective cohort study. SETTING Four Emory University Hospitals in Atlanta, GA. PATIENTS Adult patients hospitalized with COVID-19 between March 2020 and April 2022 who received HFNC therapy within 24 hours of ICU admission were included. INTERVENTIONS None. MEASUREMENTS AND MAIN RESULTS Four types of supervised ML models were developed for predicting HFNC failure (defined as intubation or death within 7 d of HFNC initiation), using routine clinical variables from the first 24 hours of ICU admission. Models were trained on the first 60% (n = 594) of admissions and validated on the latter 40% (n = 390) of admissions to simulate prospective implementation. Among 984 patients included, 317 patients (32.2%) developed HFNC failure. eXtreme Gradient Boosting (XGB) model had the highest area under the receiver-operator characteristic curve (AUROC) for predicting HFNC failure (0.707), and was the only model with significantly better performance than the ROX index (AUROC 0.616). XGB model had significantly worse performance in Black patients compared with White patients (AUROC 0.663 vs. 0.808, p = 0.02). Racial differences in the XGB model were reduced and no longer statistically significant when restricted to patients with nonmissing arterial blood gas data, and when XGB model was developed to predict mortality (rather than the composite outcome of failure, which could be influenced by biased clinical decisions for intubation). CONCLUSIONS Our XGB model had better discrimination for predicting HFNC failure in COVID-19 than the ROX index, but had racial differences in accuracy of predictions. Further studies are needed to understand and mitigate potential sources of biases in clinical ML models and to improve their equitability.
Collapse
Affiliation(s)
- Philip Yang
- Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, Emory University, Atlanta, GA
| | - Ismail A Gregory
- Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, Emory University, Atlanta, GA
| | - Chad Robichaux
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA
| | - Andre L Holder
- Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, Emory University, Atlanta, GA
| | - Greg S Martin
- Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, Emory University, Atlanta, GA
| | - Annette M Esper
- Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, Emory University, Atlanta, GA
| | - Rishikesan Kamaleswaran
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA
- Department of Surgery, Duke University School of Medicine, Durham, NC
| | - Judy W Gichoya
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, GA
| | | |
Collapse
|
11
|
Kaur D, Hughes JW, Rogers AJ, Kang G, Narayan SM, Ashley EA, Perez MV. Race, Sex, and Age Disparities in the Performance of ECG Deep Learning Models Predicting Heart Failure. Circ Heart Fail 2024; 17:e010879. [PMID: 38126168 PMCID: PMC10984643 DOI: 10.1161/circheartfailure.123.010879] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 10/18/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND Deep learning models may combat widening racial disparities in heart failure outcomes through early identification of individuals at high risk. However, demographic biases in the performance of these models have not been well-studied. METHODS This retrospective analysis used 12-lead ECGs taken between 2008 and 2018 from 326 518 patient encounters referred for standard clinical indications to Stanford Hospital. The primary model was a convolutional neural network model trained to predict incident heart failure within 5 years. Biases were evaluated on the testing set (160 312 ECGs) using the area under the receiver operating characteristic curve, stratified across the protected attributes of race, ethnicity, age, and sex. RESULTS There were 59 817 cases of incident heart failure observed within 5 years of ECG collection. The performance of the primary model declined with age. There were no significant differences observed between racial groups overall. However, the primary model performed significantly worse in Black patients aged 0 to 40 years compared with all other racial groups in this age group, with differences most pronounced among young Black women. Disparities in model performance did not improve with the integration of race, ethnicity, sex, and age into model architecture, by training separate models for each racial group, or by providing the model with a data set of equal racial representation. Using probability thresholds individualized for race, age, and sex offered substantial improvements in F1 scores. CONCLUSIONS The biases found in this study warrant caution against perpetuating disparities through the development of machine learning tools for the prognosis and management of heart failure. Customizing the application of these models by using probability thresholds individualized by race, ethnicity, age, and sex may offer an avenue to mitigate existing algorithmic disparities.
Collapse
|
12
|
Cary MP, Zink A, Wei S, Olson A, Yan M, Senior R, Bessias S, Gadhoumi K, Jean-Pierre G, Wang D, Ledbetter LS, Economou-Zavlanos NJ, Obermeyer Z, Pencina MJ. Mitigating Racial And Ethnic Bias And Advancing Health Equity In Clinical Algorithms: A Scoping Review. Health Aff (Millwood) 2023; 42:1359-1368. [PMID: 37782868 PMCID: PMC10668606 DOI: 10.1377/hlthaff.2023.00553] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
In August 2022 the Department of Health and Human Services (HHS) issued a notice of proposed rulemaking prohibiting covered entities, which include health care providers and health plans, from discriminating against individuals when using clinical algorithms in decision making. However, HHS did not provide specific guidelines on how covered entities should prevent discrimination. We conducted a scoping review of literature published during the period 2011-22 to identify health care applications, frameworks, reviews and perspectives, and assessment tools that identify and mitigate bias in clinical algorithms, with a specific focus on racial and ethnic bias. Our scoping review encompassed 109 articles comprising 45 empirical health care applications that included tools tested in health care settings, 16 frameworks, and 48 reviews and perspectives. We identified a wide range of technical, operational, and systemwide bias mitigation strategies for clinical algorithms, but there was no consensus in the literature on a single best practice that covered entities could employ to meet the HHS requirements. Future research should identify optimal bias mitigation methods for various scenarios, depending on factors such as patient population, clinical setting, algorithm design, and types of bias to be addressed.
Collapse
Affiliation(s)
- Michael P Cary
- Michael P. Cary Jr. , Duke University, Durham, North Carolina
| | - Anna Zink
- Anna Zink, University of Chicago, Chicago, Illinois
| | - Sijia Wei
- Sijia Wei, Northwestern University, Chicago, Illinois
| | | | | | | | | | | | | | | | | | | | - Ziad Obermeyer
- Ziad Obermeyer, University of California Berkeley, Berkeley, California
| | | |
Collapse
|
13
|
Le JP, Shashikumar SP, Malhotra A, Nemati S, Wardi G. Making the Improbable Possible: Generalizing Models Designed for a Syndrome-Based, Heterogeneous Patient Landscape. Crit Care Clin 2023; 39:751-768. [PMID: 37704338 PMCID: PMC10758922 DOI: 10.1016/j.ccc.2023.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
Syndromic conditions, such as sepsis, are commonly encountered in the intensive care unit. Although these conditions are easy for clinicians to grasp, these conditions may limit the performance of machine-learning algorithms. Individual hospital practice patterns may limit external generalizability. Data missingness is another barrier to optimal algorithm performance and various strategies exist to mitigate this. Recent advances in data science, such as transfer learning, conformal prediction, and continual learning, may improve generalizability of machine-learning algorithms in critically ill patients. Randomized trials with these approaches are indicated to demonstrate improvements in patient-centered outcomes at this point.
Collapse
Affiliation(s)
- Joshua Pei Le
- School of Medicine, University of Limerick, Castletroy, Co, Limerick V94 T9PX, Ireland
| | | | - Atul Malhotra
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, CA, USA
| | - Shamim Nemati
- Division of Biomedical Informatics, University of California San Diego, San Diego, CA, USA
| | - Gabriel Wardi
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, CA, USA; Department of Emergency Medicine, University of California San Diego, 200 W Arbor Drive, San Diego, CA 92103, USA.
| |
Collapse
|
14
|
Bagheri AB, Rouzi MD, Koohbanani NA, Mahoor MH, Finco MG, Lee M, Najafi B, Chung J. Potential applications of artificial intelligence and machine learning on diagnosis, treatment, and outcome prediction to address health care disparities of chronic limb-threatening ischemia. Semin Vasc Surg 2023; 36:454-459. [PMID: 37863620 DOI: 10.1053/j.semvascsurg.2023.06.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/14/2023] [Accepted: 06/20/2023] [Indexed: 10/22/2023]
Abstract
Chronic limb-threatening ischemia (CLTI) is the most advanced form of peripheral artery disease. CLTI has an extremely poor prognosis and is associated with considerable risk of major amputation, cardiac morbidity, mortality, and poor quality of life. Early diagnosis and targeted treatment of CLTI is critical for improving patient's prognosis. However, this objective has proven elusive, time-consuming, and challenging due to existing health care disparities among patients. In this article, we reviewed how artificial intelligence (AI) and machine learning (ML) can be helpful to accurately diagnose, improve outcome prediction, and identify disparities in the treatment of CLTI. We demonstrate the importance of AI/ML approaches for management of these patients and how available data could be used for computer-guided interventions. Although AI/ML applications to mitigate health care disparities in CLTI are in their infancy, we also highlighted specific AI/ML methods that show potential for addressing health care disparities in CLTI.
Collapse
Affiliation(s)
- Amir Behzad Bagheri
- Interdisciplinary Consortium on Advanced Motion Performance, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX
| | - Mohammad Dehghan Rouzi
- Interdisciplinary Consortium on Advanced Motion Performance, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX
| | - Navid Alemi Koohbanani
- Department of Computer Science, Tissue Image Analytics Centre, University of Warwick, Coventry, UK
| | - Mohammad H Mahoor
- Department of Electrical and Computer Engineering, University of Denver, Denver, CO
| | - M G Finco
- Interdisciplinary Consortium on Advanced Motion Performance, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX
| | - Myeounggon Lee
- Interdisciplinary Consortium on Advanced Motion Performance, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX
| | - Bijan Najafi
- Interdisciplinary Consortium on Advanced Motion Performance, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX
| | - Jayer Chung
- Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, One Baylor Plaza MS-390, Houston, TX 77030.
| |
Collapse
|
15
|
Kohn R, Weissman GE, Wang W, Ingraham NE, Scott S, Bayes B, Anesi GL, Halpern SD, Kipnis P, Liu VX, Dudley RA, Kerlin MP. Prediction of in-hospital mortality among intensive care unit patients using modified daily Laboratory-based Acute Physiology Scores, version 2 (LAPS2). MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.19.23284796. [PMID: 36712116 PMCID: PMC9882631 DOI: 10.1101/2023.01.19.23284796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Background Mortality prediction for intensive care unit (ICU) patients frequently relies on single acuity measures based on ICU admission physiology without accounting for subsequent clinical changes. Objectives Evaluate novel models incorporating modified admission and daily, time-updating Laboratory-based Acute Physiology Scores, version 2 (LAPS2) to predict in-hospital mortality among ICU patients. Research design Retrospective cohort study. Subjects All ICU patients in five hospitals from October 2017 through September 2019. Measures We used logistic regression, penalized logistic regression, and random forest models to predict in-hospital mortality within 30 days of ICU admission using admission LAPS2 alone in patient-level and patient-day-level models, or admission and daily LAPS2 at the patient-day level. Multivariable models included patient and admission characteristics. We performed internal-external validation using four hospitals for training and the fifth for validation, repeating analyses for each hospital as the validation set. We assessed performance using scaled Brier scores (SBS), c-statistics, and calibration plots. Results The cohort included 13,993 patients and 120,101 ICU days. The patient-level model including the modified admission LAPS2 without daily LAPS2 had an SBS of 0.175 (95% CI 0.148-0.201) and c-statistic of 0.824 (95% CI 0.808-0.840). Patient-day-level models including daily LAPS2 consistently outperformed models with modified admission LAPS2 alone. Among patients with <50% predicted mortality, daily models were better calibrated than models with modified admission LAPS2 alone. Conclusions Models incorporating daily, time-updating LAPS2 to predict mortality among an ICU population perform as well or better than models incorporating modified admission LAPS2 alone.
Collapse
Affiliation(s)
- Rachel Kohn
- Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania,Palliative and Advanced Illness Research (PAIR) Center at the University of Pennsylvania, Philadelphia, Pennsylvania,Leonard Davis Institute of Health Economics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
| | - Gary E. Weissman
- Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania,Palliative and Advanced Illness Research (PAIR) Center at the University of Pennsylvania, Philadelphia, Pennsylvania,Leonard Davis Institute of Health Economics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
| | - Wei Wang
- Palliative and Advanced Illness Research (PAIR) Center at the University of Pennsylvania, Philadelphia, Pennsylvania
| | | | - Stefania Scott
- Palliative and Advanced Illness Research (PAIR) Center at the University of Pennsylvania, Philadelphia, Pennsylvania
| | - Brian Bayes
- Palliative and Advanced Illness Research (PAIR) Center at the University of Pennsylvania, Philadelphia, Pennsylvania
| | - George L. Anesi
- Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania,Palliative and Advanced Illness Research (PAIR) Center at the University of Pennsylvania, Philadelphia, Pennsylvania,Leonard Davis Institute of Health Economics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
| | - Scott D. Halpern
- Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania,Palliative and Advanced Illness Research (PAIR) Center at the University of Pennsylvania, Philadelphia, Pennsylvania,Leonard Davis Institute of Health Economics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania,Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania,Department of Medical Ethics and Health Policy, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
| | - Patricia Kipnis
- Division of Research, Kaiser Permanente, Oakland, California
| | - Vincent X. Liu
- Division of Research, Kaiser Permanente, Oakland, California
| | | | - Meeta Prasad Kerlin
- Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania,Palliative and Advanced Illness Research (PAIR) Center at the University of Pennsylvania, Philadelphia, Pennsylvania,Leonard Davis Institute of Health Economics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
16
|
Improving child health through Big Data and data science. Pediatr Res 2023; 93:342-349. [PMID: 35974162 PMCID: PMC9380977 DOI: 10.1038/s41390-022-02264-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/10/2022] [Accepted: 06/28/2022] [Indexed: 12/04/2022]
Abstract
Child health is defined by a complex, dynamic network of genetic, cultural, nutritional, infectious, and environmental determinants at distinct, developmentally determined epochs from preconception to adolescence. This network shapes the future of children, susceptibilities to adult diseases, and individual child health outcomes. Evolution selects characteristics during fetal life, infancy, childhood, and adolescence that adapt to predictable and unpredictable exposures/stresses by creating alternative developmental phenotype trajectories. While child health has improved in the United States and globally over the past 30 years, continued improvement requires access to data that fully represent the complexity of these interactions and to new analytic methods. Big Data and innovative data science methods provide tools to integrate multiple data dimensions for description of best clinical, predictive, and preventive practices, for reducing racial disparities in child health outcomes, for inclusion of patient and family input in medical assessments, and for defining individual disease risk, mechanisms, and therapies. However, leveraging these resources will require new strategies that intentionally address institutional, ethical, regulatory, cultural, technical, and systemic barriers as well as developing partnerships with children and families from diverse backgrounds that acknowledge historical sources of mistrust. We highlight existing pediatric Big Data initiatives and identify areas of future research. IMPACT: Big Data and data science can improve child health. This review highlights the importance for child health of child-specific and life course-based Big Data and data science strategies. This review provides recommendations for future pediatric-specific Big Data and data science research.
Collapse
|
17
|
Shanklin R, Samorani M, Harris S, Santoro MA. Ethical Redress of Racial Inequities in AI: Lessons from Decoupling Machine Learning from Optimization in Medical Appointment Scheduling. PHILOSOPHY & TECHNOLOGY 2022; 35:96. [PMID: 36284736 PMCID: PMC9584259 DOI: 10.1007/s13347-022-00590-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 10/07/2022] [Indexed: 10/30/2022]
Abstract
An Artificial Intelligence algorithm trained on data that reflect racial biases may yield racially biased outputs, even if the algorithm on its own is unbiased. For example, algorithms used to schedule medical appointments in the USA predict that Black patients are at a higher risk of no-show than non-Black patients, though technically accurate given existing data that prediction results in Black patients being overwhelmingly scheduled in appointment slots that cause longer wait times than non-Black patients. This perpetuates racial inequity, in this case lesser access to medical care. This gives rise to one type of Accuracy-Fairness trade-off: preserve the efficiency offered by using AI to schedule appointments or discard that efficiency in order to avoid perpetuating ethno-racial disparities. Similar trade-offs arise in a range of AI applications including others in medicine, as well as in education, judicial systems, and public security, among others. This article presents a framework for addressing such trade-offs where Machine Learning and Optimization components of the algorithm are decoupled. Applied to medical appointment scheduling, our framework articulates four approaches intervening in different ways on different components of the algorithm. Each yields specific results, in one case preserving accuracy comparable to the current state-of-the-art while eliminating the disparity.
Collapse
Affiliation(s)
- Robert Shanklin
- Philosophy Department, Santa Clara University, 500 El Camino Real, Santa Clara, CA 950053 USA
| | - Michele Samorani
- Department of Information Systems and Analytics, Santa Clara University, 500 El Camino Real, Santa Clara, CA 950053 USA
| | - Shannon Harris
- School of Business, Virginia Commonwealth University, Snead Hall, 301 W. Main Street, Box 844000, Richmond, VA 23284-4000 USA
| | - Michael A. Santoro
- Department of Management and Entrepreneurship, Santa Clara University, 500 El Camino Real, Santa Clara, CA 950053 USA
| |
Collapse
|
18
|
Machine learning and artificial intelligence: applications in healthcare epidemiology. ANTIMICROBIAL STEWARDSHIP & HEALTHCARE EPIDEMIOLOGY : ASHE 2022; 1:e28. [PMID: 36168500 PMCID: PMC9495400 DOI: 10.1017/ash.2021.192] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 08/11/2021] [Indexed: 12/21/2022]
Abstract
Artificial intelligence (AI) refers to the performance of tasks by machines ordinarily associated with human intelligence. Machine learning (ML) is a subtype of AI; it refers to the ability of computers to draw conclusions (ie, learn) from data without being directly programmed. ML builds from traditional statistical methods and has drawn significant interest in healthcare epidemiology due to its potential for improving disease prediction and patient care. This review provides an overview of ML in healthcare epidemiology and practical examples of ML tools used to support healthcare decision making at 4 stages of hospital-based care: triage, diagnosis, treatment, and discharge. Examples include model-building efforts to assist emergency department triage, predicting time before septic shock onset, detecting community-acquired pneumonia, and classifying COVID-19 disposition risk level. Increasing availability and quality of electronic health record (EHR) data as well as computing power provides opportunities for ML to increase patient safety, improve the efficiency of clinical management, and reduce healthcare costs.
Collapse
|
19
|
Park J, Arunachalam R, Silenzio V, Singh VK. Fairness in Mobile Phone–Based Mental Health Assessment Algorithms: Exploratory Study. JMIR Form Res 2022; 6:e34366. [PMID: 35699997 PMCID: PMC9240929 DOI: 10.2196/34366] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 03/27/2022] [Accepted: 04/10/2022] [Indexed: 11/13/2022] Open
Abstract
Background
Approximately 1 in 5 American adults experience mental illness every year. Thus, mobile phone–based mental health prediction apps that use phone data and artificial intelligence techniques for mental health assessment have become increasingly important and are being rapidly developed. At the same time, multiple artificial intelligence–related technologies (eg, face recognition and search results) have recently been reported to be biased regarding age, gender, and race. This study moves this discussion to a new domain: phone-based mental health assessment algorithms. It is important to ensure that such algorithms do not contribute to gender disparities through biased predictions across gender groups.
Objective
This research aimed to analyze the susceptibility of multiple commonly used machine learning approaches for gender bias in mobile mental health assessment and explore the use of an algorithmic disparate impact remover (DIR) approach to reduce bias levels while maintaining high accuracy.
Methods
First, we performed preprocessing and model training using the data set (N=55) obtained from a previous study. Accuracy levels and differences in accuracy across genders were computed using 5 different machine learning models. We selected the random forest model, which yielded the highest accuracy, for a more detailed audit and computed multiple metrics that are commonly used for fairness in the machine learning literature. Finally, we applied the DIR approach to reduce bias in the mental health assessment algorithm.
Results
The highest observed accuracy for the mental health assessment was 78.57%. Although this accuracy level raises optimism, the audit based on gender revealed that the performance of the algorithm was statistically significantly different between the male and female groups (eg, difference in accuracy across genders was 15.85%; P<.001). Similar trends were obtained for other fairness metrics. This disparity in performance was found to reduce significantly after the application of the DIR approach by adapting the data used for modeling (eg, the difference in accuracy across genders was 1.66%, and the reduction is statistically significant with P<.001).
Conclusions
This study grounds the need for algorithmic auditing in phone-based mental health assessment algorithms and the use of gender as a protected attribute to study fairness in such settings. Such audits and remedial steps are the building blocks for the widespread adoption of fair and accurate mental health assessment algorithms in the future.
Collapse
Affiliation(s)
- Jinkyung Park
- School of Communication & Information, Rutgers University, New Brunswick, NJ, United States
| | | | - Vincent Silenzio
- School of Public Health, Rutgers University, Newark, NJ, United States
| | - Vivek K Singh
- School of Communication & Information, Rutgers University, New Brunswick, NJ, United States
- Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
20
|
Chi S, Guo A, Heard K, Kim S, Foraker R, White P, Moore N. Development and Structure of an Accurate Machine Learning Algorithm to Predict Inpatient Mortality and Hospice Outcomes in the Coronavirus Disease 2019 Era. Med Care 2022; 60:381-386. [PMID: 35230273 PMCID: PMC8989608 DOI: 10.1097/mlr.0000000000001699] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
BACKGROUND The coronavirus disease 2019 (COVID-19) pandemic has challenged the accuracy and racial biases present in traditional mortality scores. An accurate prognostic model that can be applied to hospitalized patients irrespective of race or COVID-19 status may benefit patient care. RESEARCH DESIGN This cohort study utilized historical and ongoing electronic health record features to develop and validate a deep-learning model applied on the second day of admission predicting a composite outcome of in-hospital mortality, discharge to hospice, or death within 30 days of admission. Model features included patient demographics, diagnoses, procedures, inpatient medications, laboratory values, vital signs, and substance use history. Conventional performance metrics were assessed, and subgroup analysis was performed based on race, COVID-19 status, and intensive care unit admission. SUBJECTS A total of 35,521 patients hospitalized between April 2020 and October 2020 at a single health care system including a tertiary academic referral center and 9 community hospitals. RESULTS Of 35,521 patients, including 9831 non-White patients and 2020 COVID-19 patients, 2838 (8.0%) met the composite outcome. Patients who experienced the composite outcome were older (73 vs. 61 y old) with similar sex and race distributions between groups. The model achieved an area under the receiver operating characteristic curve of 0.89 (95% confidence interval: 0.88, 0.91) and an average positive predictive value of 0.46 (0.40, 0.52). Model performance did not differ significantly in White (0.89) and non-White (0.90) subgroups or when grouping by COVID-19 status and intensive care unit admission. CONCLUSION A deep-learning model using large-volume, structured electronic health record data can effectively predict short-term mortality or hospice outcomes on the second day of admission in the general inpatient population without significant racial bias.
Collapse
Affiliation(s)
- Stephen Chi
- Division of Pulmonary and Critical Care Medicine
| | - Aixia Guo
- Institute for Informatics, Washington University in St. Louis
| | | | - Seunghwan Kim
- Division of General Medical Sciences, School of Medicine, Washington University in St. Louis
| | - Randi Foraker
- Institute for Informatics, Washington University in St. Louis
| | - Patrick White
- Division of Palliative Medicine, Department of Medicine, Washington University in St. Louis
| | | |
Collapse
|
21
|
Establishment of ICU Mortality Risk Prediction Models with Machine Learning Algorithm Using MIMIC-IV Database. Diagnostics (Basel) 2022; 12:diagnostics12051068. [PMID: 35626224 PMCID: PMC9139972 DOI: 10.3390/diagnostics12051068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 12/10/2022] Open
Abstract
Objective: The mortality rate of critically ill patients in ICUs is relatively high. In order to evaluate patients’ mortality risk, different scoring systems are used to help clinicians assess prognosis in ICUs, such as the Acute Physiology and Chronic Health Evaluation III (APACHE III) and the Logistic Organ Dysfunction Score (LODS). In this research, we aimed to establish and compare multiple machine learning models with physiology subscores of APACHE III—namely, the Acute Physiology Score III (APS III)—and LODS scoring systems in order to obtain better performance for ICU mortality prediction. Methods: A total number of 67,748 patients from the Medical Information Database for Intensive Care (MIMIC-IV) were enrolled, including 7055 deceased patients, and the same number of surviving patients were selected by the random downsampling technique, for a total of 14,110 patients included in the study. The enrolled patients were randomly divided into a training dataset (n = 9877) and a validation dataset (n = 4233). Fivefold cross-validation and grid search procedures were used to find and evaluate the best hyperparameters in different machine learning models. Taking the subscores of LODS and the physiology subscores that are part of the APACHE III scoring systems as input variables, four machine learning methods of XGBoost, logistic regression, support vector machine, and decision tree were used to establish ICU mortality prediction models, with AUCs as metrics. AUCs, specificity, sensitivity, positive predictive value, negative predictive value, and calibration curves were used to find the best model. Results: For the prediction of mortality risk in ICU patients, the AUC of the XGBoost model was 0.918 (95%CI, 0.915–0.922), and the AUCs of logistic regression, SVM, and decision tree were 0.872 (95%CI, 0.867–0.877), 0.872 (95%CI, 0.867–0.877), and 0.852 (95%CI, 0.847–0.857), respectively. The calibration curves of logistic regression and support vector machine performed better than the other two models in the ranges 0–40% and 70%–100%, respectively, while XGBoost performed better in the range of 40–70%. Conclusions: The mortality risk of ICU patients can be better predicted by the characteristics of the Acute Physiology Score III and the Logistic Organ Dysfunction Score with XGBoost in terms of ROC curve, sensitivity, and specificity. The XGBoost model could assist clinicians in judging in-hospital outcome of critically ill patients, especially in patients with a more uncertain survival outcome.
Collapse
|
22
|
Celi LA. PLOS Digital Health, a new journal driving transformation in the delivery of equitable and unbiased healthcare. PLOS DIGITAL HEALTH 2022; 1:e0000009. [PMID: 36812523 PMCID: PMC9931357 DOI: 10.1371/journal.pdig.0000009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Leo Anthony Celi
- PLOS Digital Health, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
23
|
Huang J, Galal G, Etemadi M, Vaidyanathan M. Evaluation and Mitigation of Racial Bias in Clinical Machine Learning Models: A Scoping Review (Preprint). JMIR Med Inform 2022; 10:e36388. [PMID: 35639450 PMCID: PMC9198828 DOI: 10.2196/36388] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/17/2022] [Accepted: 03/27/2022] [Indexed: 01/12/2023] Open
Abstract
Background Racial bias is a key concern regarding the development, validation, and implementation of machine learning (ML) models in clinical settings. Despite the potential of bias to propagate health disparities, racial bias in clinical ML has yet to be thoroughly examined and best practices for bias mitigation remain unclear. Objective Our objective was to perform a scoping review to characterize the methods by which the racial bias of ML has been assessed and describe strategies that may be used to enhance algorithmic fairness in clinical ML. Methods A scoping review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Extension for Scoping Reviews. A literature search using PubMed, Scopus, and Embase databases, as well as Google Scholar, identified 635 records, of which 12 studies were included. Results Applications of ML were varied and involved diagnosis, outcome prediction, and clinical score prediction performed on data sets including images, diagnostic studies, clinical text, and clinical variables. Of the 12 studies, 1 (8%) described a model in routine clinical use, 2 (17%) examined prospectively validated clinical models, and the remaining 9 (75%) described internally validated models. In addition, 8 (67%) studies concluded that racial bias was present, 2 (17%) concluded that it was not, and 2 (17%) assessed the implementation of bias mitigation strategies without comparison to a baseline model. Fairness metrics used to assess algorithmic racial bias were inconsistent. The most commonly observed metrics were equal opportunity difference (5/12, 42%), accuracy (4/12, 25%), and disparate impact (2/12, 17%). All 8 (67%) studies that implemented methods for mitigation of racial bias successfully increased fairness, as measured by the authors’ chosen metrics. Preprocessing methods of bias mitigation were most commonly used across all studies that implemented them. Conclusions The broad scope of medical ML applications and potential patient harms demand an increased emphasis on evaluation and mitigation of racial bias in clinical ML. However, the adoption of algorithmic fairness principles in medicine remains inconsistent and is limited by poor data availability and ML model reporting. We recommend that researchers and journal editors emphasize standardized reporting and data availability in medical ML studies to improve transparency and facilitate evaluation for racial bias.
Collapse
Affiliation(s)
- Jonathan Huang
- Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Galal Galal
- Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Mozziyar Etemadi
- Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
- Department of Biomedical Engineering, Northwestern University, Evanston, IL, United States
| | - Mahesh Vaidyanathan
- Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
- Digital Health & Data Science Curricular Thread, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| |
Collapse
|
24
|
Yan Y, Chen C, Liu Y, Zhang Z, Xu L, Pu K. Application of Machine Learning for the Prediction of Etiological Types of Classic Fever of Unknown Origin. Front Public Health 2022; 9:800549. [PMID: 35004599 PMCID: PMC8739804 DOI: 10.3389/fpubh.2021.800549] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 12/08/2021] [Indexed: 12/22/2022] Open
Abstract
Background: The etiology of fever of unknown origin (FUO) is complex and remains a major challenge for clinicians. This study aims to investigate the distribution of the etiology of classic FUO and the differences in clinical indicators in patients with different etiologies of classic FUO and to establish a machine learning (ML) model based on clinical data. Methods: The clinical data and final diagnosis results of 527 patients with classic FUO admitted to 7 medical institutions in Chongqing from January 2012 to August 2021 and who met the classic FUO diagnostic criteria were collected. Three hundred seventy-three patients with final diagnosis were divided into 4 groups according to 4 different etiological types of classical FUO, and statistical analysis was carried out to screen out the indicators with statistical differences under different etiological types. On the basis of these indicators, five kinds of ML models, i.e., random forest (RF), support vector machine (SVM), Light Gradient Boosting Machine (LightGBM), artificial neural network (ANN), and naive Bayes (NB) models, were used to evaluate all datasets using 5-fold cross-validation, and the performance of the models were evaluated using micro-F1 scores. Results: The 373 patients were divided into the infectious disease group (n = 277), non-infectious inflammatory disease group (n = 51), neoplastic disease group (n = 31), and other diseases group (n = 14) according to 4 different etiological types. Another 154 patients were classified as undetermined group because the cause of fever was still unclear at discharge. There were significant differences in gender, age, and 18 other indicators among the four groups of patients with classic FUO with different etiological types (P < 0.05). The micro-F1 score for LightGBM was 75.8%, which was higher than that for the other four ML models, and the LightGBM prediction model had the best performance. Conclusions: Infectious diseases are still the main etiological type of classic FUO. Based on 18 statistically significant clinical indicators such as gender and age, we constructed and evaluated five ML models. LightGBM model has a good effect on predicting the etiological type of classic FUO, which will play a good auxiliary decision-making function.
Collapse
Affiliation(s)
- Yongjie Yan
- School of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Chongyuan Chen
- Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Yunyu Liu
- Medical Records and Statistics Office, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Zuyue Zhang
- School of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Lin Xu
- School of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Kexue Pu
- School of Medical Informatics, Chongqing Medical University, Chongqing, China
| |
Collapse
|
25
|
Thapa R, Garikipati A, Shokouhi S, Hurtado M, Barnes G, Hoffman J, Calvert J, Katzmann L, Mao Q, Das R. Usability of Electronic Health records in Predicting Short-term falls: Machine learning Applications in Senior Care Facilities (Preprint). JMIR Aging 2021; 5:e35373. [PMID: 35363146 PMCID: PMC9015781 DOI: 10.2196/35373] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 01/16/2022] [Accepted: 02/07/2022] [Indexed: 11/23/2022] Open
Abstract
Background Short-term fall prediction models that use electronic health records (EHRs) may enable the implementation of dynamic care practices that specifically address changes in individualized fall risk within senior care facilities. Objective The aim of this study is to implement machine learning (ML) algorithms that use EHR data to predict a 3-month fall risk in residents from a variety of senior care facilities providing different levels of care. Methods This retrospective study obtained EHR data (2007-2021) from Juniper Communities’ proprietary database of 2785 individuals primarily residing in skilled nursing facilities, independent living facilities, and assisted living facilities across the United States. We assessed the performance of 3 ML-based fall prediction models and the Juniper Communities’ fall risk assessment. Additional analyses were conducted to examine how changes in the input features, training data sets, and prediction windows affected the performance of these models. Results The Extreme Gradient Boosting model exhibited the highest performance, with an area under the receiver operating characteristic curve of 0.846 (95% CI 0.794-0.894), specificity of 0.848, diagnostic odds ratio of 13.40, and sensitivity of 0.706, while achieving the best trade-off in balancing true positive and negative rates. The number of active medications was the most significant feature associated with fall risk, followed by a resident’s number of active diseases and several variables associated with vital signs, including diastolic blood pressure and changes in weight and respiratory rates. The combination of vital signs with traditional risk factors as input features achieved higher prediction accuracy than using either group of features alone. Conclusions This study shows that the Extreme Gradient Boosting technique can use a large number of features from EHR data to make short-term fall predictions with a better performance than that of conventional fall risk assessments and other ML models. The integration of routinely collected EHR data, particularly vital signs, into fall prediction models may generate more accurate fall risk surveillance than models without vital signs. Our data support the use of ML models for dynamic, cost-effective, and automated fall predictions in different types of senior care facilities.
Collapse
|
26
|
Abstract
OBJECTIVE Child undernutrition is a global public health problem with serious implications. In this study, we estimate predictive algorithms for the determinants of childhood stunting by using various machine learning (ML) algorithms. DESIGN This study draws on data from the Ethiopian Demographic and Health Survey of 2016. Five ML algorithms including eXtreme gradient boosting, k-nearest neighbours (k-NN), random forest, neural network and the generalised linear models were considered to predict the socio-demographic risk factors for undernutrition in Ethiopia. SETTING Households in Ethiopia. PARTICIPANTS A total of 9471 children below 5 years of age participated in this study. RESULTS The descriptive results show substantial regional variations in child stunting, wasting and underweight in Ethiopia. Also, among the five ML algorithms, xgbTree algorithm shows a better prediction ability than the generalised linear mixed algorithm. The best predicting algorithm (xgbTree) shows diverse important predictors of undernutrition across the three outcomes which include time to water source, anaemia history, child age greater than 30 months, small birth size and maternal underweight, among others. CONCLUSIONS The xgbTree algorithm was a reasonably superior ML algorithm for predicting childhood undernutrition in Ethiopia compared to other ML algorithms considered in this study. The findings support improvement in access to water supply, food security and fertility regulation, among others, in the quest to considerably improve childhood nutrition in Ethiopia.
Collapse
|
27
|
Abstract
PURPOSE OF REVIEW Despite attention to racial disparities in outcomes for heart failure (HF) and other chronic diseases, progress against these inequities has been gradual at best. The disparities of COVID-19 and police brutality have highlighted the pervasiveness of systemic racism in health outcomes. Whether racial bias impacts patient access to advanced HF therapies is unclear. RECENT FINDINGS As documented in other settings, racial bias appears to operate in HF providers' consideration of patients for advanced therapy. Multiple medical and psychosocial elements of the evaluation process are particularly vulnerable to bias. SUMMARY Reducing gaps in access to advanced therapies will require commitments at multiple levels to reduce barriers to healthcare access, standardize clinical operations, research the determinants of patient success and increase diversity among providers and researchers. Progress is achievable but likely requires as disruptive and investment of immense resources as in the battle against COVID-19.
Collapse
Affiliation(s)
- Raymond C Givens
- Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, New York, USA
| |
Collapse
|