1
|
Tran L, Kandel H, Sari D, Chiu CH, Watson SL. Artificial Intelligence and Ophthalmic Clinical Registries. Am J Ophthalmol 2024; 268:263-274. [PMID: 39111520 DOI: 10.1016/j.ajo.2024.07.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/30/2024] [Accepted: 07/31/2024] [Indexed: 09/03/2024]
Abstract
PURPOSE The recent advances in artificial intelligence (AI) represent a promising solution to increasing clinical demand and ever limited health resources. Whilst powerful, AI models require vast amounts of representative training data to output meaningful predictions in the clinical environment. Clinical registries represent a promising source of large volume real-world data which could be used to train more accurate and widely applicable AI models. This review aims to provide an overview of the current applications of AI to ophthalmic clinical registry data. DESIGN AND METHODS A systematic search of EMBASE, Medline, PubMed, Scopus and Web of Science for primary research articles that applied AI to ophthalmic clinical registry data was conducted in July 2024. RESULTS Twenty-three primary research articles applying AI to ophthalmic clinic registries (n = 14) were found. Registries were primarily defined by the condition captured and the most common conditions where AI was applied were glaucoma (n = 3) and neovascular age-related macular degeneration (n = 3). Tabular clinical data was the most common form of input into AI algorithms and outputs were primarily classifiers (n = 8, 40%) and risk quantifier models (n = 7, 35%). The AI algorithms applied were almost exclusively supervised conventional machine learning models (n = 39, 85%) such as decision tree classifiers and logistic regression, with only 7 applications of deep learning or natural language processing algorithms. Significant heterogeneity was found with regards to model validation methodology and measures of performance. CONCLUSIONS Limited applications of deep learning algorithms to clinical registry data have been reported. The lack of standardized validation methodology and heterogeneity of performance outcome reporting suggests that the application of AI to clinical registries is still in its infancy constrained by the poor accessibility of registry data and reflecting the need for a standardization of methodology and greater involvement of domain experts in the future development of clinically deployable AI.
Collapse
Affiliation(s)
- Luke Tran
- From the Faculty of Medicine and Health, Save Sight Institute, The University of Sydney, (L.T., H.K., D.S., C.H.C., S.L.W.) Sydney, New South Wales, Australia.
| | - Himal Kandel
- From the Faculty of Medicine and Health, Save Sight Institute, The University of Sydney, (L.T., H.K., D.S., C.H.C., S.L.W.) Sydney, New South Wales, Australia
| | - Daliya Sari
- From the Faculty of Medicine and Health, Save Sight Institute, The University of Sydney, (L.T., H.K., D.S., C.H.C., S.L.W.) Sydney, New South Wales, Australia
| | - Christopher Hy Chiu
- From the Faculty of Medicine and Health, Save Sight Institute, The University of Sydney, (L.T., H.K., D.S., C.H.C., S.L.W.) Sydney, New South Wales, Australia
| | - Stephanie L Watson
- From the Faculty of Medicine and Health, Save Sight Institute, The University of Sydney, (L.T., H.K., D.S., C.H.C., S.L.W.) Sydney, New South Wales, Australia
| |
Collapse
|
2
|
Wang HE, Weiner JP, Saria S, Lehmann H, Kharrazi H. Assessing racial bias in healthcare predictive models: Practical lessons from an empirical evaluation of 30-day hospital readmission models. J Biomed Inform 2024; 156:104683. [PMID: 38925281 DOI: 10.1016/j.jbi.2024.104683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 05/20/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
OBJECTIVE Despite increased availability of methodologies to identify algorithmic bias, the operationalization of bias evaluation for healthcare predictive models is still limited. Therefore, this study proposes a process for bias evaluation through an empirical assessment of common hospital readmission models. The process includes selecting bias measures, interpretation, determining disparity impact and potential mitigations. METHODS This retrospective analysis evaluated racial bias of four common models predicting 30-day unplanned readmission (i.e., LACE Index, HOSPITAL Score, and the CMS readmission measure applied as is and retrained). The models were assessed using 2.4 million adult inpatient discharges in Maryland from 2016 to 2019. Fairness metrics that are model-agnostic, easy to compute, and interpretable were implemented and apprised to select the most appropriate bias measures. The impact of changing model's risk thresholds on these measures was further assessed to guide the selection of optimal thresholds to control and mitigate bias. RESULTS Four bias measures were selected for the predictive task: zero-one-loss difference, false negative rate (FNR) parity, false positive rate (FPR) parity, and generalized entropy index. Based on these measures, the HOSPITAL score and the retrained CMS measure demonstrated the lowest racial bias. White patients showed a higher FNR while Black patients resulted in a higher FPR and zero-one-loss. As the models' risk threshold changed, trade-offs between models' fairness and overall performance were observed, and the assessment showed all models' default thresholds were reasonable for balancing accuracy and bias. CONCLUSIONS This study proposes an Applied Framework to Assess Fairness of Predictive Models (AFAFPM) and demonstrates the process using 30-day hospital readmission model as the example. It suggests the feasibility of applying algorithmic bias assessment to determine optimized risk thresholds so that predictive models can be used more equitably and accurately. It is evident that a combination of qualitative and quantitative methods and a multidisciplinary team are necessary to identify, understand and respond to algorithm bias in real-world healthcare settings. Users should also apply multiple bias measures to ensure a more comprehensive, tailored, and balanced view. The results of bias measures, however, must be interpreted with caution and consider the larger operational, clinical, and policy context.
Collapse
Affiliation(s)
- H Echo Wang
- Department of Health Policy and Management, Johns Hopkins School of Public Health, Baltimore, MD, USA.
| | - Jonathan P Weiner
- Department of Health Policy and Management, Johns Hopkins School of Public Health, Baltimore, MD, USA; Center for Population Health Information Technology, Johns Hopkins School of Public Health, Baltimore, MD, USA.
| | - Suchi Saria
- Department of Computer Science and Statistics, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
| | - Harold Lehmann
- Biomedical Informatics and Data Science, Division of General Internal Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| | - Hadi Kharrazi
- Department of Health Policy and Management, Johns Hopkins School of Public Health, Baltimore, MD, USA; Center for Population Health Information Technology, Johns Hopkins School of Public Health, Baltimore, MD, USA; Biomedical Informatics and Data Science, Division of General Internal Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
3
|
Soares Dias Portela A, Saxena V, Rosenn E, Wang SH, Masieri S, Palmieri J, Pasinetti GM. Role of Artificial Intelligence in Multinomial Decisions and Preventative Nutrition in Alzheimer's Disease. Mol Nutr Food Res 2024; 68:e2300605. [PMID: 38175857 DOI: 10.1002/mnfr.202300605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 10/04/2023] [Indexed: 01/06/2024]
Abstract
Alzheimer's disease (AD) affects 50 million people worldwide, an increase of 35 million since 2015, and it is known for memory loss and cognitive decline. Considering the morbidity associated with AD, it is important to explore lifestyle elements influencing the chances of developing AD, with special emphasis on nutritional aspects. This review will first discuss how dietary factors have an impact in AD development and the possible role of Artificial Intelligence (AI) and Machine Learning (ML) in preventative care of AD patients through nutrition. The Mediterranean-DASH diets provide individuals with many nutrient benefits which assists the prevention of neurodegeneration by having neuroprotective roles. Lack of micronutrients, protein-energy, and polyunsaturated fatty acids increase the chance of cognitive decline, loss of memory, and synaptic dysfunction among others. ML software has the ability to design models of algorithms from data introduced to present practical solutions that are accessible and easy to use. It can give predictions for a precise medicine approach to evaluate individuals as a whole. There is no doubt the future of nutritional science lies on customizing diets for individuals to reduce dementia risk factors, maintain overall health and brain function.
Collapse
Affiliation(s)
| | - Vrinda Saxena
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Eric Rosenn
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Shu-Han Wang
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Sibilla Masieri
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Joshua Palmieri
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Giulio Maria Pasinetti
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
- Geriatrics Research, Education and Clinical Center, JJ Peters VA Medical Center, Bronx, NY, 10468, USA
| |
Collapse
|
4
|
Kim YE, Serpedin A, Periyakoil P, German D, Rameau A. Sociodemographic reporting in videomics research: a review of practices in otolaryngology - head and neck surgery. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08659-0. [PMID: 38704768 DOI: 10.1007/s00405-024-08659-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 04/02/2024] [Indexed: 05/07/2024]
Abstract
OBJECTIVE To assess reporting practices of sociodemographic data in Upper Aerodigestive Tract (UAT) videomics research in Otolaryngology-Head and Neck Surgery (OHNS). STUDY DESIGN Narrative review. METHODS Four online research databases were searched for peer-reviewed articles on videomics and UAT endoscopy in OHNS, published since January 1, 2017. Title and abstract search, followed by a full-text screening was performed. Dataset audit criteria were determined by the MINIMAR reporting standards for patient demographic characteristics, in addition to gender and author affiliations. RESULTS Of the 57 studies that were included, 37% reported any sociodemographic information on their dataset. Among these studies, all reported age, most reported sex (86%), two (10%) reported race, and one (5%) reported ethnicity and socioeconomic status. No studies reported gender. Most studies (84%) included at least one female author, and more than half of the studies (53%) had female first/senior authors, with no significant differences in the rate of sociodemographic reporting in studies with and without female authors (any female author: p = 0.2664; first/senior female author: p > 0.9999). Most studies based in the US reported at least one sociodemographic variable (79%), compared to those in Europe (24%) and in Asia (20%) (p = 0.0012). The rates of sociodemographic reporting in journals of different categories were as follows: clinical OHNS: 44%, clinical non-OHNS: 40%, technical: 42%, interdisciplinary: 10%. CONCLUSIONS There is prevalent underreporting of sociodemographic information in OHNS videomics research utilizing UAT endoscopy. Routine reporting of sociodemographic information should be implemented for AI-based research to help minimize algorithmic biases that have been previously demonstrated. LEVEL OF EVIDENCE: 4
Collapse
Affiliation(s)
- Yeo Eun Kim
- Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, Sean Parker Institute for the Voice, 240 East 59th St, New York, NY, 10022, USA
| | - Aisha Serpedin
- Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, Sean Parker Institute for the Voice, 240 East 59th St, New York, NY, 10022, USA
| | - Preethi Periyakoil
- Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, Sean Parker Institute for the Voice, 240 East 59th St, New York, NY, 10022, USA
| | - Daniel German
- Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, Sean Parker Institute for the Voice, 240 East 59th St, New York, NY, 10022, USA
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, Sean Parker Institute for the Voice, 240 East 59th St, New York, NY, 10022, USA.
| |
Collapse
|
5
|
Kapoor S, Cantrell EM, Peng K, Pham TH, Bail CA, Gundersen OE, Hofman JM, Hullman J, Lones MA, Malik MM, Nanayakkara P, Poldrack RA, Raji ID, Roberts M, Salganik MJ, Serra-Garcia M, Stewart BM, Vandewiele G, Narayanan A. REFORMS: Consensus-based Recommendations for Machine-learning-based Science. SCIENCE ADVANCES 2024; 10:eadk3452. [PMID: 38691601 PMCID: PMC11092361 DOI: 10.1126/sciadv.adk3452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/29/2024] [Indexed: 05/03/2024]
Abstract
Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear recommendations for conducting and reporting ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist (recommendations for machine-learning-based science). It consists of 32 questions and a paired set of guidelines. REFORMS was developed on the basis of a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
Collapse
Affiliation(s)
- Sayash Kapoor
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| | - Emily M. Cantrell
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- School of Public and International Affairs, Princeton University, Princeton, NJ 08544, USA
| | - Kenny Peng
- Department of Computer Science, Cornell University, Ithaca, NY 14850, USA
| | - Thanh Hien Pham
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| | - Christopher A. Bail
- Department of Sociology, Duke University, Durham, NC 27708, USA
- Department of Political Science, Duke University, Durham, NC 27708, USA
- Sanford School of Public Policy, Duke University, Durham, NC 27708, USA
| | - Odd Erik Gundersen
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Aneo AS, Trondheim, Norway
| | | | - Jessica Hullman
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
| | - Michael A. Lones
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
| | - Momin M. Malik
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute in Critical Quantitative, Computational, & Mixed Methodologies, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Priyanka Nanayakkara
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
- Department of Communication Studies, Northwestern University, Evanston, IL 60208, USA
| | | | - Inioluwa Deborah Raji
- Department of Computer Science, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - Matthew J. Salganik
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- Office of Population Research, Princeton University, Princeton, NJ 08544, USA
| | - Marta Serra-Garcia
- Rady School of Management, University of California, San Diego, La Jolla, CA 92093, USA
| | - Brandon M. Stewart
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- Office of Population Research, Princeton University, Princeton, NJ 08544, USA
- Department of Politics, Princeton University, Princeton, NJ 08544, USA
| | - Gilles Vandewiele
- Department of Information Technology, Ghent University, Ghent, Belgium
| | - Arvind Narayanan
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
6
|
Rose C, Barber R, Preiksaitis C, Kim I, Mishra N, Kayser K, Brown I, Gisondi M. A Conference (Missingness in Action) to Address Missingness in Data and AI in Health Care: Qualitative Thematic Analysis. J Med Internet Res 2023; 25:e49314. [PMID: 37995113 PMCID: PMC10704317 DOI: 10.2196/49314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 07/27/2023] [Accepted: 10/25/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND Missingness in health care data poses significant challenges in the development and implementation of artificial intelligence (AI) and machine learning solutions. Identifying and addressing these challenges is critical to ensuring the continued growth and accuracy of these models as well as their equitable and effective use in health care settings. OBJECTIVE This study aims to explore the challenges, opportunities, and potential solutions related to missingness in health care data for AI applications through the conduct of a digital conference and thematic analysis of conference proceedings. METHODS A digital conference was held in September 2022, attracting 861 registered participants, with 164 (19%) attending the live event. The conference featured presentations and panel discussions by experts in AI, machine learning, and health care. Transcripts of the event were analyzed using the stepwise framework of Braun and Clark to identify key themes related to missingness in health care data. RESULTS Three principal themes-data quality and bias, human input in model development, and trust and privacy-emerged from the analysis. Topics included the accuracy of predictive models, lack of inclusion of underrepresented communities, partnership with physicians and other populations, challenges with sensitive health care data, and fostering trust with patients and the health care community. CONCLUSIONS Addressing the challenges of data quality, human input, and trust is vital when devising and using machine learning algorithms in health care. Recommendations include expanding data collection efforts to reduce gaps and biases, involving medical professionals in the development and implementation of AI models, and developing clear ethical guidelines to safeguard patient privacy. Further research and ongoing discussions are needed to ensure these conclusions remain relevant as health care and AI continue to evolve.
Collapse
Affiliation(s)
- Christian Rose
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | | | - Carl Preiksaitis
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Ireh Kim
- Stanford University, Palo Alto, CA, United States
| | | | - Kristen Kayser
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Italo Brown
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Michael Gisondi
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| |
Collapse
|
7
|
Hernandez-Boussard T, Siddique SM, Bierman AS, Hightower M, Burstin H. Promoting Equity In Clinical Decision Making: Dismantling Race-Based Medicine. Health Aff (Millwood) 2023; 42:1369-1373. [PMID: 37782875 PMCID: PMC10849087 DOI: 10.1377/hlthaff.2023.00545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
As the use of artificial intelligence has spread rapidly throughout the US health care system, concerns have been raised about racial and ethnic biases built into the algorithms that often guide clinical decision making. Race-based medicine, which relies on algorithms that use race as a proxy for biological differences, has led to treatment patterns that are inappropriate, unjust, and harmful to minoritized racial and ethnic groups. These patterns have contributed to persistent disparities in health and health care. To reduce these disparities, we recommend a race-aware approach to clinical decision support that considers social and environmental factors such as structural racism and social determinants of health. Recent policy changes in medical specialty societies and innovations in algorithm development represent progress on the path to dismantling race-based medicine. Success will require continued commitment and sustained efforts among stakeholders in the health care, research, and technology sectors. Increasing the diversity of clinical trial populations, broadening the focus of precision medicine, improving education about the complex factors shaping health outcomes, and developing new guidelines and policies to enable culturally responsive care are important next steps.
Collapse
Affiliation(s)
| | | | - Arlene S Bierman
- Arlene S. Bierman, Agency for Healthcare Research and Quality, Rockville, Maryland
| | - Maia Hightower
- Maia Hightower, University of Chicago, Chicago, Illinois
| | - Helen Burstin
- Helen Burstin, Council of Medical Specialty Societies, Washington, D.C
| |
Collapse
|
8
|
Abdulazeem H, Whitelaw S, Schauberger G, Klug SJ. A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data. PLoS One 2023; 18:e0274276. [PMID: 37682909 PMCID: PMC10491005 DOI: 10.1371/journal.pone.0274276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 08/29/2023] [Indexed: 09/10/2023] Open
Abstract
With the advances in technology and data science, machine learning (ML) is being rapidly adopted by the health care sector. However, there is a lack of literature addressing the health conditions targeted by the ML prediction models within primary health care (PHC) to date. To fill this gap in knowledge, we conducted a systematic review following the PRISMA guidelines to identify health conditions targeted by ML in PHC. We searched the Cochrane Library, Web of Science, PubMed, Elsevier, BioRxiv, Association of Computing Machinery (ACM), and IEEE Xplore databases for studies published from January 1990 to January 2022. We included primary studies addressing ML diagnostic or prognostic predictive models that were supplied completely or partially by real-world PHC data. Studies selection, data extraction, and risk of bias assessment using the prediction model study risk of bias assessment tool were performed by two investigators. Health conditions were categorized according to international classification of diseases (ICD-10). Extracted data were analyzed quantitatively. We identified 106 studies investigating 42 health conditions. These studies included 207 ML prediction models supplied by the PHC data of 24.2 million participants from 19 countries. We found that 92.4% of the studies were retrospective and 77.3% of the studies reported diagnostic predictive ML models. A majority (76.4%) of all the studies were for models' development without conducting external validation. Risk of bias assessment revealed that 90.8% of the studies were of high or unclear risk of bias. The most frequently reported health conditions were diabetes mellitus (19.8%) and Alzheimer's disease (11.3%). Our study provides a summary on the presently available ML prediction models within PHC. We draw the attention of digital health policy makers, ML models developer, and health care professionals for more future interdisciplinary research collaboration in this regard.
Collapse
Affiliation(s)
- Hebatullah Abdulazeem
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich (TUM), Munich, Germany
| | - Sera Whitelaw
- Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada
| | - Gunther Schauberger
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich (TUM), Munich, Germany
| | - Stefanie J. Klug
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich (TUM), Munich, Germany
| |
Collapse
|
9
|
Busch F, Adams LC, Bressem KK. Biomedical Ethical Aspects Towards the Implementation of Artificial Intelligence in Medical Education. MEDICAL SCIENCE EDUCATOR 2023; 33:1007-1012. [PMID: 37546190 PMCID: PMC10403458 DOI: 10.1007/s40670-023-01815-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 05/31/2023] [Indexed: 08/08/2023]
Abstract
The increasing use of artificial intelligence (AI) in medicine is associated with new ethical challenges and responsibilities. However, special considerations and concerns should be addressed when integrating AI applications into medical education, where healthcare, AI, and education ethics collide. This commentary explores the biomedical ethical responsibilities of medical institutions in incorporating AI applications into medical education by identifying potential concerns and limitations, with the goal of implementing applicable recommendations. The recommendations presented are intended to assist in developing institutional guidelines for the ethical use of AI for medical educators and students.
Collapse
Affiliation(s)
- Felix Busch
- Department of Radiology, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
- Department of Anesthesiology, Division of Operative Intensive Care Medicine, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
| | - Lisa C. Adams
- Department of Radiology, Stanford University School of Medicine, Stanford, CA USA
| | - Keno K. Bressem
- Department of Radiology, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
10
|
Kanda E, Suzuki A, Makino M, Tsubota H, Kanemata S, Shirakawa K, Yajima T. Machine learning models for prediction of HF and CKD development in early-stage type 2 diabetes patients. Sci Rep 2022; 12:20012. [PMID: 36411366 PMCID: PMC9678863 DOI: 10.1038/s41598-022-24562-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/17/2022] [Indexed: 11/23/2022] Open
Abstract
Chronic kidney disease (CKD) and heart failure (HF) are the first and most frequent comorbidities associated with mortality risks in early-stage type 2 diabetes mellitus (T2DM). However, efficient screening and risk assessment strategies for identifying T2DM patients at high risk of developing CKD and/or HF (CKD/HF) remains to be established. This study aimed to generate a novel machine learning (ML) model to predict the risk of developing CKD/HF in early-stage T2DM patients. The models were derived from a retrospective cohort of 217,054 T2DM patients without a history of cardiovascular and renal diseases extracted from a Japanese claims database. Among algorithms used for the ML, extreme gradient boosting exhibited the best performance for CKD/HF diagnosis and hospitalization after internal validation and was further validated using another dataset including 16,822 patients. In the external validation, 5-years prediction area under the receiver operating characteristic curves for CKD/HF diagnosis and hospitalization were 0.718 and 0.837, respectively. In Kaplan-Meier curves analysis, patients predicted to be at high risk showed significant increase in CKD/HF diagnosis and hospitalization compared with those at low risk. Thus, the developed model predicted the risk of developing CKD/HF in T2DM patients with reasonable probability in the external validation cohort. Clinical approach identifying T2DM at high risk of developing CKD/HF using ML models may contribute to improved prognosis by promoting early diagnosis and intervention.
Collapse
Affiliation(s)
- Eiichiro Kanda
- grid.415086.e0000 0001 1014 2000Medical Science, Kawasaki Medical University, Okayama, Japan
| | - Atsushi Suzuki
- grid.256115.40000 0004 1761 798XDepartment of Endocrinology, Diabetes and Metabolism, Fujita Health University, Toyoake, Aichi Japan
| | - Masaki Makino
- grid.256115.40000 0004 1761 798XDepartment of Endocrinology, Diabetes and Metabolism, Fujita Health University, Toyoake, Aichi Japan
| | - Hiroo Tsubota
- grid.476017.30000 0004 0376 5631AstraZeneca K.K., Osaka, Japan
| | - Satomi Kanemata
- grid.459873.40000 0004 0376 2510Ono Pharmaceutical Co., Ltd., Osaka, Japan
| | | | | |
Collapse
|
11
|
|
12
|
Lu J, Sattler A, Wang S, Khaki AR, Callahan A, Fleming S, Fong R, Ehlert B, Li RC, Shieh L, Ramchandran K, Gensheimer MF, Chobot S, Pfohl S, Li S, Shum K, Parikh N, Desai P, Seevaratnam B, Hanson M, Smith M, Xu Y, Gokhale A, Lin S, Pfeffer MA, Teuteberg W, Shah NH. Considerations in the reliability and fairness audits of predictive models for advance care planning. Front Digit Health 2022; 4:943768. [PMID: 36339512 PMCID: PMC9634737 DOI: 10.3389/fdgth.2022.943768] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 08/17/2022] [Indexed: 11/30/2022] Open
Abstract
Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we conducted a reliability audit of two models based on model performance and calibration as well as a fairness audit based on summary statistics, subgroup performance and subgroup calibration. We assessed the Epic End-of-Life (EOL) Index model and an internally developed Stanford Hospital Medicine (HM) Advance Care Planning (ACP) model in 3 practice settings: Primary Care, Inpatient Oncology and Hospital Medicine, using clinicians' answers to the surprise question (“Would you be surprised if [patient X] passed away in [Y years]?”) as a surrogate outcome. For performance, the models had positive predictive value (PPV) at or above 0.76 in all settings. In Hospital Medicine and Inpatient Oncology, the Stanford HM ACP model had higher sensitivity (0.69, 0.89 respectively) than the EOL model (0.20, 0.27), and better calibration (O/E 1.5, 1.7) than the EOL model (O/E 2.5, 3.0). The Epic EOL model flagged fewer patients (11%, 21% respectively) than the Stanford HM ACP model (38%, 75%). There were no differences in performance and calibration by sex. Both models had lower sensitivity in Hispanic/Latino male patients with Race listed as “Other.” 10 clinicians were surveyed after a presentation summarizing the audit. 10/10 reported that summary statistics, overall performance, and subgroup performance would affect their decision to use the model to guide care; 9/10 said the same for overall and subgroup calibration. The most commonly identified barriers for routinely conducting such reliability and fairness audits were poor demographic data quality and lack of data access. This audit required 115 person-hours across 8–10 months. Our recommendations for performing reliability and fairness audits include verifying data validity, analyzing model performance on intersectional subgroups, and collecting clinician-patient linkages as necessary for label generation by clinicians. Those responsible for AI models should require such audits before model deployment and mediate between model auditors and impacted stakeholders.
Collapse
Affiliation(s)
- Jonathan Lu
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
- Correspondence: Jonathan Hsijing Lu
| | - Amelia Sattler
- Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Samantha Wang
- Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Ali Raza Khaki
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Alison Callahan
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Scott Fleming
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Rebecca Fong
- Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Benjamin Ehlert
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Ron C. Li
- Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Lisa Shieh
- Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Kavitha Ramchandran
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Michael F. Gensheimer
- Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, United States
| | - Sarah Chobot
- Inpatient Palliative Care, Stanford Health Care, Palo Alto, United States
| | - Stephen Pfohl
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Siyun Li
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Kenny Shum
- Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
| | - Nitin Parikh
- Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
| | - Priya Desai
- Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
| | - Briththa Seevaratnam
- Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Melanie Hanson
- Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Margaret Smith
- Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Yizhe Xu
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Arjun Gokhale
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Steven Lin
- Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Michael A. Pfeffer
- Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
- Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
| | - Winifred Teuteberg
- Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Nigam H. Shah
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
- Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
- Clinical Excellence Research Center, Stanford University School of Medicine, Palo Alto, United States
| |
Collapse
|
13
|
Shahzad R, Ayub B, Siddiqui MAR. Quality of reporting of randomised controlled trials of artificial intelligence in healthcare: a systematic review. BMJ Open 2022; 12:e061519. [PMID: 36691151 PMCID: PMC9445816 DOI: 10.1136/bmjopen-2022-061519] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 08/17/2022] [Indexed: 01/26/2023] Open
Abstract
OBJECTIVES The aim of this study was to evaluate the quality of reporting of randomised controlled trials (RCTs) of artificial intelligence (AI) in healthcare against Consolidated Standards of Reporting Trials-AI (CONSORT-AI) guidelines. DESIGN Systematic review. DATA SOURCES We searched PubMed and EMBASE databases for studies reported from January 2015 to December 2021. ELIGIBILITY CRITERIA We included RCTs reported in English that used AI as the intervention. Protocols, conference abstracts, studies on robotics and studies related to medical education were excluded. DATA EXTRACTION The included studies were graded using the CONSORT-AI checklist, comprising 43 items, by two independent graders. The results were tabulated and descriptive statistics were reported. RESULTS We screened 1501 potential abstracts, of which 112 full-text articles were reviewed for eligibility. A total of 42 studies were included. The number of participants ranged from 22 to 2352. Only two items of the CONSORT-AI items were fully reported in all studies. Five items were not applicable in more than 85% of the studies. Nineteen per cent (8/42) of the studies did not report more than 50% (21/43) of the CONSORT-AI checklist items. CONCLUSIONS The quality of reporting of RCTs in AI is suboptimal. As reporting is variable in existing RCTs, caution should be exercised in interpreting the findings of some studies.
Collapse
Affiliation(s)
- Rida Shahzad
- Department of Ophthalmology, Shahzad Eye Hospital, Karachi, Pakistan
| | - Bushra Ayub
- Centre for Clinical Best Practices, Aga Khan University Hospital, Karachi, Pakistan
| | - M A Rehman Siddiqui
- Department of Ophthalmology and Visual Sciences, Aga Khan University Hospital, Karachi, Pakistan
| |
Collapse
|
14
|
Lu JH, Callahan A, Patel BS, Morse KE, Dash D, Pfeffer MA, Shah NH. Assessment of Adherence to Reporting Guidelines by Commonly Used Clinical Prediction Models From a Single Vendor: A Systematic Review. JAMA Netw Open 2022; 5:e2227779. [PMID: 35984654 PMCID: PMC9391954 DOI: 10.1001/jamanetworkopen.2022.27779] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 07/04/2022] [Indexed: 12/23/2022] Open
Abstract
Importance Various model reporting guidelines have been proposed to ensure clinical prediction models are reliable and fair. However, no consensus exists about which model details are essential to report, and commonalities and differences among reporting guidelines have not been characterized. Furthermore, how well documentation of deployed models adheres to these guidelines has not been studied. Objectives To assess information requested by model reporting guidelines and whether the documentation for commonly used machine learning models developed by a single vendor provides the information requested. Evidence Review MEDLINE was queried using machine learning model card and reporting machine learning from November 4 to December 6, 2020. References were reviewed to find additional publications, and publications without specific reporting recommendations were excluded. Similar elements requested for reporting were merged into representative items. Four independent reviewers and 1 adjudicator assessed how often documentation for the most commonly used models developed by a single vendor reported the items. Findings From 15 model reporting guidelines, 220 unique items were identified that represented the collective reporting requirements. Although 12 items were commonly requested (requested by 10 or more guidelines), 77 items were requested by just 1 guideline. Documentation for 12 commonly used models from a single vendor reported a median of 39% (IQR, 37%-43%; range, 31%-47%) of items from the collective reporting requirements. Many of the commonly requested items had 100% reporting rates, including items concerning outcome definition, area under the receiver operating characteristics curve, internal validation, and intended clinical use. Several items reported half the time or less related to reliability, such as external validation, uncertainty measures, and strategy for handling missing data. Other frequently unreported items related to fairness (summary statistics and subgroup analyses, including for race and ethnicity or sex). Conclusions and Relevance These findings suggest that consistent reporting recommendations for clinical predictive models are needed for model developers to share necessary information for model deployment. The many published guidelines would, collectively, require reporting more than 200 items. Model documentation from 1 vendor reported the most commonly requested items from model reporting guidelines. However, areas for improvement were identified in reporting items related to model reliability and fairness. This analysis led to feedback to the vendor, which motivated updates to the documentation for future users.
Collapse
Affiliation(s)
- Jonathan H. Lu
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
| | - Alison Callahan
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
| | - Birju S. Patel
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
| | - Keith E. Morse
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California
- Department of Clinical Informatics, Lucile Packard Children’s Hospital, Palo Alto, California
| | - Dev Dash
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
| | - Michael A. Pfeffer
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
- Technology and Digital Solutions, Stanford Medicine, Stanford, California
| | - Nigam H. Shah
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
- Technology and Digital Solutions, Stanford Medicine, Stanford, California
- Clinical Excellence Research Center, Stanford Medicine, Stanford, California
| |
Collapse
|
15
|
Golder S, O'Connor K, Wang Y, Stevens R, Gonzalez-Hernandez G. Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases. Annu Rev Biomed Data Sci 2022; 5:251-267. [PMID: 35562851 DOI: 10.1146/annurev-biodatasci-122120-025806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for (a) "women," "men," or "sex"; (b) "big data," "artificial intelligence," or "NLP"; and (c) "disparities" or "differences." From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in health research, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Su Golder
- Department of Health Sciences, University of York, York, United Kingdom
| | - Karen O'Connor
- Department of Biostatistics, Epidemiology and Informatics (DBEI), University of Pennsylvania, Philadelphia, Pennsylvania, USA;
| | - Yunwen Wang
- Annenberg School for Communication and Journalism, University of Southern California, Los Angeles, California, USA
| | - Robin Stevens
- Annenberg School for Communication and Journalism, University of Southern California, Los Angeles, California, USA
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology and Informatics (DBEI), University of Pennsylvania, Philadelphia, Pennsylvania, USA;
| |
Collapse
|
16
|
Lossio-Ventura JA, Song W, Sainlaire M, Dykes PC, Hernandez-Boussard T. Opioid2MME: Standardizing opioid prescriptions to morphine milligram equivalents from electronic health records. Int J Med Inform 2022; 162:104739. [PMID: 35325663 PMCID: PMC9477978 DOI: 10.1016/j.ijmedinf.2022.104739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 02/26/2022] [Accepted: 03/11/2022] [Indexed: 12/27/2022]
Abstract
BACKGROUND The national increase in opioid use and misuse has become a public health crisis in the U.S. To tackle this crisis, the systematic evaluation and monitoring of opioid prescribing patterns is necessary. Thus, opioid prescriptions from electronic health records (EHRs) must be standardized to morphine milligram equivalent (MME) to facilitate monitoring and surveillance. While most studies report MMEs to describe opioid prescribing patterns, there is a lack of transparency regarding their data pre-processing and conversion processes for replication or comparison purposes. METHODS In this work, we developed Opioid2MME, a SQL-based open-source framework, to convert opioid prescriptions to MMEs using EHR prescription data. The MME conversions were validated internally using F-measures through manual chart review; were compared with two existing tools, as MedEx and MedXN; and the framework was tested in an external academic EHR system. RESULTS We identified 232,913 prescriptions for 49,060 unique patients in the EHRs, 2008-2019. We manually annotated a sample of prescriptions to assess the performance of the framework. The internal evaluation for medication information extraction achieved F-measures from 0.98 to 1.00 for each piece of the extracted information, outperforming MedEx and MedXN (F-Scores 0.98 and 0.94, respectively). MME values in the internal EHR system obtained a F-measure of 0.97 and identified 3% of the data as outliers and 7% missing values. The MME conversion in the external EHR system obtained 78.3% agreement between the MME values obtained with the development site. CONCLUSIONS The results demonstrated that the framework is replicable and capable of converting opioid prescriptions to MMEs across different medical institutions. In summary, this work sets the groundwork for the systematic evaluation and monitoring of opioid prescribing patterns across healthcare systems.
Collapse
Affiliation(s)
- Juan Antonio Lossio-Ventura
- Department of Medicine (Biomedical Informatics), Stanford University, Stanford, CA, USA; National Institute of Mental Health, National Institutes of Health, MD, USA.
| | - Wenyu Song
- Department of Medicine, Brigham & Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | | | - Patricia C Dykes
- Department of Medicine, Brigham & Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Mass General Brigham, Boston, MA, USA
| | | |
Collapse
|
17
|
Zhalechian M, Van Oyen MP, Lavieri MS, De Moraes CG, Girkin CA, Fazio MA, Weinreb RN, Bowd C, Liebmann JM, Zangwill LM, Andrews CA, Stein JD. Augmenting Kalman Filter Machine Learning Models with Data from OCT to Predict Future Visual Field Loss: An Analysis Using Data from the African Descent and Glaucoma Evaluation Study and the Diagnostic Innovation in Glaucoma Study. OPHTHALMOLOGY SCIENCE 2022; 2:100097. [PMID: 36246178 PMCID: PMC9560647 DOI: 10.1016/j.xops.2021.100097] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 11/17/2021] [Accepted: 12/01/2021] [Indexed: 11/28/2022]
Abstract
Purpose To assess whether the predictive accuracy of machine learning algorithms using Kalman filtering for forecasting future values of global indices on perimetry can be enhanced by adding global retinal nerve fiber layer (RNFL) data and whether model performance is influenced by the racial composition of the training and testing sets. Design Retrospective, longitudinal cohort study. Participants Patients with open-angle glaucoma (OAG) or glaucoma suspects enrolled in the African Descent and Glaucoma Evaluation Study or Diagnostic Innovation in Glaucoma Study. Methods We developed a Kalman filter (KF) with tonometry and perimetry data (KF-TP) and another KF with tonometry, perimetry, and global RNFL data (KF-TPO), comparing these models with one another and with 2 linear regression (LR) models for predicting mean deviation (MD) and pattern standard deviation values 36 months into the future for patients with OAG and glaucoma suspects. We also compared KF model performance when trained on individuals of European and African descent and tested on patients of the same versus the other race. Main Outcome Measures Predictive accuracy (percentage of MD values forecasted within the 95% repeatability interval) differences among the models. Results Among 362 eligible patients, the mean ± standard deviation age at baseline was 71.3 ± 10.4 years; 196 patients (54.1%) were women; 202 patients (55.8%) were of European descent, and 139 (38.4%) were of African descent. Among patients with OAG (n = 296), the predictive accuracy for 36 months in the future was higher for the KF models (73.5% for KF-TP, 71.2% for KF-TPO) than for the LR models (57.5%, 58.0%). Predictive accuracy did not differ significantly between KF-TP and KF-TPO (P = 0.20). If the races of the training and testing set patients were aligned (versus nonaligned), the mean absolute prediction error of future MD improved 0.39 dB for KF-TP and 0.48 dB for KF-TPO. Conclusions Adding global RNFL data to existing KFs minimally improved their predictive accuracy. Although KFs attained better predictive accuracy when the races of the training and testing sets were aligned, these improvements were modest. These findings will help to guide implementation of KFs in clinical practice.
Collapse
Key Words
- AD, African descent
- ADAGES, African Descent and Glaucoma Evaluation Study
- Algorithm bias
- CI, confidence interval
- D, diopter
- DIGS, Diagnostic Innovation in Glaucoma Study
- ED, European descent
- Glaucoma
- IOP, intraocular pressure
- KF, Kalman filter
- KF-TP, Kalman filter with tonometry and perimetry data
- KF-TPO, Kalman filter with tonometry, perimetry, and global retinal nerve fiber layer data
- Kalman filter
- LR1, linear regression model 1
- LR2, linear regression model 2
- MAE, mean absolute error
- MD, mean deviation
- Machine learning
- OAG, open-angle glaucoma
- OCT
- PSD, pattern standard deviation
- RMSE, root mean square error
- RNFL, retinal nerve fiber layer
- SD, standard deviation
- VF, visual field
Collapse
Affiliation(s)
- Mohammad Zhalechian
- Department of Industrial and Operations Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
| | - Mark P. Van Oyen
- Department of Industrial and Operations Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
| | - Mariel S. Lavieri
- Department of Industrial and Operations Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
| | - Carlos Gustavo De Moraes
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Edward S. Harkness Eye Institute, Columbia University Irving Medical Center, New York, New York
| | - Christopher A. Girkin
- Department of Ophthalmology, University of Alabama at Birmingham, Birmingham, Alabama
| | - Massimo A. Fazio
- Department of Ophthalmology, University of Alabama at Birmingham, Birmingham, Alabama
| | - Robert N. Weinreb
- Hamilton Glaucoma Center, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California, San Diego, La Jolla, California
| | - Christopher Bowd
- Hamilton Glaucoma Center, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California, San Diego, La Jolla, California
| | - Jeffrey M. Liebmann
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Edward S. Harkness Eye Institute, Columbia University Irving Medical Center, New York, New York
| | - Linda M. Zangwill
- Hamilton Glaucoma Center, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California, San Diego, La Jolla, California
| | - Christopher A. Andrews
- Department of Ophthalmology and Visual Sciences, University of Michigan Medical School, Ann Arbor, Michigan
- Center for Eye Policy and Innovation, University of Michigan, Ann Arbor, Michigan
| | - Joshua D. Stein
- Department of Ophthalmology and Visual Sciences, University of Michigan Medical School, Ann Arbor, Michigan
- Center for Eye Policy and Innovation, University of Michigan, Ann Arbor, Michigan
- Department of Health Management and Policy, University of Michigan School of Public Health, Ann Arbor, Michigan
| |
Collapse
|
18
|
Huang J, Galal G, Etemadi M, Vaidyanathan M. Evaluation and Mitigation of Racial Bias in Clinical Machine Learning Models: A Scoping Review (Preprint). JMIR Med Inform 2022; 10:e36388. [PMID: 35639450 PMCID: PMC9198828 DOI: 10.2196/36388] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/17/2022] [Accepted: 03/27/2022] [Indexed: 01/12/2023] Open
Abstract
Background Racial bias is a key concern regarding the development, validation, and implementation of machine learning (ML) models in clinical settings. Despite the potential of bias to propagate health disparities, racial bias in clinical ML has yet to be thoroughly examined and best practices for bias mitigation remain unclear. Objective Our objective was to perform a scoping review to characterize the methods by which the racial bias of ML has been assessed and describe strategies that may be used to enhance algorithmic fairness in clinical ML. Methods A scoping review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Extension for Scoping Reviews. A literature search using PubMed, Scopus, and Embase databases, as well as Google Scholar, identified 635 records, of which 12 studies were included. Results Applications of ML were varied and involved diagnosis, outcome prediction, and clinical score prediction performed on data sets including images, diagnostic studies, clinical text, and clinical variables. Of the 12 studies, 1 (8%) described a model in routine clinical use, 2 (17%) examined prospectively validated clinical models, and the remaining 9 (75%) described internally validated models. In addition, 8 (67%) studies concluded that racial bias was present, 2 (17%) concluded that it was not, and 2 (17%) assessed the implementation of bias mitigation strategies without comparison to a baseline model. Fairness metrics used to assess algorithmic racial bias were inconsistent. The most commonly observed metrics were equal opportunity difference (5/12, 42%), accuracy (4/12, 25%), and disparate impact (2/12, 17%). All 8 (67%) studies that implemented methods for mitigation of racial bias successfully increased fairness, as measured by the authors’ chosen metrics. Preprocessing methods of bias mitigation were most commonly used across all studies that implemented them. Conclusions The broad scope of medical ML applications and potential patient harms demand an increased emphasis on evaluation and mitigation of racial bias in clinical ML. However, the adoption of algorithmic fairness principles in medicine remains inconsistent and is limited by poor data availability and ML model reporting. We recommend that researchers and journal editors emphasize standardized reporting and data availability in medical ML studies to improve transparency and facilitate evaluation for racial bias.
Collapse
Affiliation(s)
- Jonathan Huang
- Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Galal Galal
- Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Mozziyar Etemadi
- Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
- Department of Biomedical Engineering, Northwestern University, Evanston, IL, United States
| | - Mahesh Vaidyanathan
- Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
- Digital Health & Data Science Curricular Thread, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| |
Collapse
|
19
|
Hernandez-Boussard T, Macklin P, Greenspan EJ, Gryshuk AL, Stahlberg E, Syeda-Mahmood T, Shmulevich I. Digital twins for predictive oncology will be a paradigm shift for precision cancer care. Nat Med 2021; 27:2065-2066. [PMID: 34824458 PMCID: PMC9097784 DOI: 10.1038/s41591-021-01558-5] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
| | - Paul Macklin
- Department of Medicine, Indiana University, Stanford, CA, USA
| | - Emily J Greenspan
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| | - Amy L Gryshuk
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Eric Stahlberg
- Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | |
Collapse
|
20
|
Barboi C, Tzavelis A, Muhammad LN. Comparison of Severity of Illness Scores and Artificial Intelligence Models Predictive of Intensive Care Unit Mortality: Meta-analysis and review of the literature (Preprint). JMIR Med Inform 2021; 10:e35293. [PMID: 35639445 PMCID: PMC9198821 DOI: 10.2196/35293] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 04/24/2022] [Accepted: 04/25/2022] [Indexed: 12/23/2022] Open
Affiliation(s)
- Cristina Barboi
- Indiana University Purdue University, Regenstrief Institue, Indianapolis, IN, United States
| | - Andreas Tzavelis
- Medical Scientist Training Program, Feinberg School of Medicine, Chicago, IL, United States
- Department of Biomedical Engineering, Northwestern University, Chicago, IL, United States
| | - Lutfiyya NaQiyba Muhammad
- Department of Preventive Medicine and Biostatistics, Northwestern University, Evanston, IL, United States
| |
Collapse
|
21
|
Shelmerdine SC, Arthurs OJ, Denniston A, Sebire NJ. Review of study reporting guidelines for clinical studies using artificial intelligence in healthcare. BMJ Health Care Inform 2021; 28:bmjhci-2021-100385. [PMID: 34426417 PMCID: PMC8383863 DOI: 10.1136/bmjhci-2021-100385] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 08/09/2021] [Indexed: 02/07/2023] Open
Abstract
High-quality research is essential in guiding evidence-based care, and should be reported in a way that is reproducible, transparent and where appropriate, provide sufficient detail for inclusion in future meta-analyses. Reporting guidelines for various study designs have been widely used for clinical (and preclinical) studies, consisting of checklists with a minimum set of points for inclusion. With the recent rise in volume of research using artificial intelligence (AI), additional factors need to be evaluated, which do not neatly conform to traditional reporting guidelines (eg, details relating to technical algorithm development). In this review, reporting guidelines are highlighted to promote awareness of essential content required for studies evaluating AI interventions in healthcare. These include published and in progress extensions to well-known reporting guidelines such as Standard Protocol Items: Recommendations for Interventional Trials-AI (study protocols), Consolidated Standards of Reporting Trials-AI (randomised controlled trials), Standards for Reporting of Diagnostic Accuracy Studies-AI (diagnostic accuracy studies) and Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis-AI (prediction model studies). Additionally there are a number of guidelines that consider AI for health interventions more generally (eg, Checklist for Artificial Intelligence in Medical Imaging (CLAIM), minimum information (MI)-CLAIM, MI for Medical AI Reporting) or address a specific element such as the ‘learning curve’ (Developmental and Exploratory Clinical Investigation of Decision-AI). Economic evaluation of AI health interventions is not currently addressed, and may benefit from extension to an existing guideline. In the face of a rapid influx of studies of AI health interventions, reporting guidelines help ensure that investigators and those appraising studies consider both the well-recognised elements of good study design and reporting, while also adequately addressing new challenges posed by AI-specific elements.
Collapse
Affiliation(s)
| | - Owen J Arthurs
- Radiology, Great Ormond Street Hospital NHS Foundation Trust, London, UK
| | - Alastair Denniston
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
| | - Neil J Sebire
- Digital Research, Informatics and Virtual Environments Unit (DRIVE), London, UK
| |
Collapse
|