Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bozkurt S, Cahan EM, Seneviratne MG, Sun R, Lossio-Ventura JA, Ioannidis JPA, Hernandez-Boussard T. Reporting of demographic data and representativeness in machine learning models using electronic health records. J Am Med Inform Assoc 2020;27:1878-1884. [PMID: 32935131 PMCID: PMC7727384 DOI: 10.1093/jamia/ocaa164] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 06/22/2020] [Accepted: 06/27/2020] [Indexed: 12/23/2022] Open

For:	Bozkurt S, Cahan EM, Seneviratne MG, Sun R, Lossio-Ventura JA, Ioannidis JPA, Hernandez-Boussard T. Reporting of demographic data and representativeness in machine learning models using electronic health records. J Am Med Inform Assoc 2020;27:1878-1884. [PMID: 32935131 PMCID: PMC7727384 DOI: 10.1093/jamia/ocaa164] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 06/22/2020] [Accepted: 06/27/2020] [Indexed: 12/23/2022] Open

Number

Cited by Other Article(s)

Tran L, Kandel H, Sari D, Chiu CH, Watson SL. Artificial Intelligence and Ophthalmic Clinical Registries. Am J Ophthalmol 2024;268:263-274. [PMID: 39111520 DOI: 10.1016/j.ajo.2024.07.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/30/2024] [Accepted: 07/31/2024] [Indexed: 09/03/2024]

Abstract

PURPOSE

The recent advances in artificial intelligence (AI) represent a promising solution to increasing clinical demand and ever limited health resources. Whilst powerful, AI models require vast amounts of representative training data to output meaningful predictions in the clinical environment. Clinical registries represent a promising source of large volume real-world data which could be used to train more accurate and widely applicable AI models. This review aims to provide an overview of the current applications of AI to ophthalmic clinical registry data.

DESIGN AND METHODS

A systematic search of EMBASE, Medline, PubMed, Scopus and Web of Science for primary research articles that applied AI to ophthalmic clinical registry data was conducted in July 2024.

RESULTS

Twenty-three primary research articles applying AI to ophthalmic clinic registries (n = 14) were found. Registries were primarily defined by the condition captured and the most common conditions where AI was applied were glaucoma (n = 3) and neovascular age-related macular degeneration (n = 3). Tabular clinical data was the most common form of input into AI algorithms and outputs were primarily classifiers (n = 8, 40%) and risk quantifier models (n = 7, 35%). The AI algorithms applied were almost exclusively supervised conventional machine learning models (n = 39, 85%) such as decision tree classifiers and logistic regression, with only 7 applications of deep learning or natural language processing algorithms. Significant heterogeneity was found with regards to model validation methodology and measures of performance.

CONCLUSIONS

Limited applications of deep learning algorithms to clinical registry data have been reported. The lack of standardized validation methodology and heterogeneity of performance outcome reporting suggests that the application of AI to clinical registries is still in its infancy constrained by the poor accessibility of registry data and reflecting the need for a standardization of methodology and greater involvement of domain experts in the future development of clinically deployable AI.

Collapse

Wang HE, Weiner JP, Saria S, Lehmann H, Kharrazi H. Assessing racial bias in healthcare predictive models: Practical lessons from an empirical evaluation of 30-day hospital readmission models. J Biomed Inform 2024;156:104683. [PMID: 38925281 DOI: 10.1016/j.jbi.2024.104683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 05/20/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]

Abstract

OBJECTIVE

Despite increased availability of methodologies to identify algorithmic bias, the operationalization of bias evaluation for healthcare predictive models is still limited. Therefore, this study proposes a process for bias evaluation through an empirical assessment of common hospital readmission models. The process includes selecting bias measures, interpretation, determining disparity impact and potential mitigations.

METHODS

This retrospective analysis evaluated racial bias of four common models predicting 30-day unplanned readmission (i.e., LACE Index, HOSPITAL Score, and the CMS readmission measure applied as is and retrained). The models were assessed using 2.4 million adult inpatient discharges in Maryland from 2016 to 2019. Fairness metrics that are model-agnostic, easy to compute, and interpretable were implemented and apprised to select the most appropriate bias measures. The impact of changing model's risk thresholds on these measures was further assessed to guide the selection of optimal thresholds to control and mitigate bias.

RESULTS

Four bias measures were selected for the predictive task: zero-one-loss difference, false negative rate (FNR) parity, false positive rate (FPR) parity, and generalized entropy index. Based on these measures, the HOSPITAL score and the retrained CMS measure demonstrated the lowest racial bias. White patients showed a higher FNR while Black patients resulted in a higher FPR and zero-one-loss. As the models' risk threshold changed, trade-offs between models' fairness and overall performance were observed, and the assessment showed all models' default thresholds were reasonable for balancing accuracy and bias.

CONCLUSIONS

This study proposes an Applied Framework to Assess Fairness of Predictive Models (AFAFPM) and demonstrates the process using 30-day hospital readmission model as the example. It suggests the feasibility of applying algorithmic bias assessment to determine optimized risk thresholds so that predictive models can be used more equitably and accurately. It is evident that a combination of qualitative and quantitative methods and a multidisciplinary team are necessary to identify, understand and respond to algorithm bias in real-world healthcare settings. Users should also apply multiple bias measures to ensure a more comprehensive, tailored, and balanced view. The results of bias measures, however, must be interpreted with caution and consider the larger operational, clinical, and policy context.

Collapse

Soares Dias Portela A, Saxena V, Rosenn E, Wang SH, Masieri S, Palmieri J, Pasinetti GM. Role of Artificial Intelligence in Multinomial Decisions and Preventative Nutrition in Alzheimer's Disease. Mol Nutr Food Res 2024;68:e2300605. [PMID: 38175857 DOI: 10.1002/mnfr.202300605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 10/04/2023] [Indexed: 01/06/2024]

Kim YE, Serpedin A, Periyakoil P, German D, Rameau A. Sociodemographic reporting in videomics research: a review of practices in otolaryngology - head and neck surgery. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08659-0. [PMID: 38704768 DOI: 10.1007/s00405-024-08659-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 04/02/2024] [Indexed: 05/07/2024]

Abstract

OBJECTIVE

To assess reporting practices of sociodemographic data in Upper Aerodigestive Tract (UAT) videomics research in Otolaryngology-Head and Neck Surgery (OHNS).

STUDY DESIGN

Narrative review.

METHODS

Four online research databases were searched for peer-reviewed articles on videomics and UAT endoscopy in OHNS, published since January 1, 2017. Title and abstract search, followed by a full-text screening was performed. Dataset audit criteria were determined by the MINIMAR reporting standards for patient demographic characteristics, in addition to gender and author affiliations.

RESULTS

Of the 57 studies that were included, 37% reported any sociodemographic information on their dataset. Among these studies, all reported age, most reported sex (86%), two (10%) reported race, and one (5%) reported ethnicity and socioeconomic status. No studies reported gender. Most studies (84%) included at least one female author, and more than half of the studies (53%) had female first/senior authors, with no significant differences in the rate of sociodemographic reporting in studies with and without female authors (any female author: p = 0.2664; first/senior female author: p > 0.9999). Most studies based in the US reported at least one sociodemographic variable (79%), compared to those in Europe (24%) and in Asia (20%) (p = 0.0012). The rates of sociodemographic reporting in journals of different categories were as follows: clinical OHNS: 44%, clinical non-OHNS: 40%, technical: 42%, interdisciplinary: 10%.

CONCLUSIONS

There is prevalent underreporting of sociodemographic information in OHNS videomics research utilizing UAT endoscopy. Routine reporting of sociodemographic information should be implemented for AI-based research to help minimize algorithmic biases that have been previously demonstrated.

LEVEL OF EVIDENCE: 4

Collapse

Kapoor S, Cantrell EM, Peng K, Pham TH, Bail CA, Gundersen OE, Hofman JM, Hullman J, Lones MA, Malik MM, Nanayakkara P, Poldrack RA, Raji ID, Roberts M, Salganik MJ, Serra-Garcia M, Stewart BM, Vandewiele G, Narayanan A. REFORMS: Consensus-based Recommendations for Machine-learning-based Science. SCIENCE ADVANCES 2024;10:eadk3452. [PMID: 38691601 PMCID: PMC11092361 DOI: 10.1126/sciadv.adk3452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/29/2024] [Indexed: 05/03/2024]

Affiliation(s)

Sayash Kapoor Department of Computer Science, Princeton University, Princeton, NJ 08544, USA Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
Emily M. Cantrell Department of Sociology, Princeton University, Princeton, NJ 08544, USA School of Public and International Affairs, Princeton University, Princeton, NJ 08544, USA
Kenny Peng Department of Computer Science, Cornell University, Ithaca, NY 14850, USA
Thanh Hien Pham Department of Computer Science, Princeton University, Princeton, NJ 08544, USA Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
Christopher A. Bail Department of Sociology, Duke University, Durham, NC 27708, USA Department of Political Science, Duke University, Durham, NC 27708, USA Sanford School of Public Policy, Duke University, Durham, NC 27708, USA
Odd Erik Gundersen Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway Aneo AS, Trondheim, Norway
Jake M. Hofman Microsoft Research, New York, NY 10012, USA
Jessica Hullman Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
Michael A. Lones School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
Momin M. Malik Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA 19104, USA Institute in Critical Quantitative, Computational, & Mixed Methodologies, Johns Hopkins University, Baltimore, MD 21218, USA
Priyanka Nanayakkara Department of Computer Science, Northwestern University, Evanston, IL 60208, USA Department of Communication Studies, Northwestern University, Evanston, IL 60208, USA
Russell A. Poldrack Department of Psychology, Stanford University, Stanford, CA 94305, USA
Inioluwa Deborah Raji Department of Computer Science, University of California, Berkeley, Berkeley, CA 94720, USA
Michael Roberts Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK Department of Medicine, University of Cambridge, Cambridge, UK
Matthew J. Salganik Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA Department of Sociology, Princeton University, Princeton, NJ 08544, USA Office of Population Research, Princeton University, Princeton, NJ 08544, USA
Marta Serra-Garcia Rady School of Management, University of California, San Diego, La Jolla, CA 92093, USA
Brandon M. Stewart Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA Department of Sociology, Princeton University, Princeton, NJ 08544, USA Office of Population Research, Princeton University, Princeton, NJ 08544, USA Department of Politics, Princeton University, Princeton, NJ 08544, USA
Gilles Vandewiele Department of Information Technology, Ghent University, Ghent, Belgium
Arvind Narayanan Department of Computer Science, Princeton University, Princeton, NJ 08544, USA Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA

Collapse

Rose C, Barber R, Preiksaitis C, Kim I, Mishra N, Kayser K, Brown I, Gisondi M. A Conference (Missingness in Action) to Address Missingness in Data and AI in Health Care: Qualitative Thematic Analysis. J Med Internet Res 2023;25:e49314. [PMID: 37995113 PMCID: PMC10704317 DOI: 10.2196/49314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 07/27/2023] [Accepted: 10/25/2023] [Indexed: 11/24/2023] Open

Abstract

BACKGROUND

Missingness in health care data poses significant challenges in the development and implementation of artificial intelligence (AI) and machine learning solutions. Identifying and addressing these challenges is critical to ensuring the continued growth and accuracy of these models as well as their equitable and effective use in health care settings.

OBJECTIVE

This study aims to explore the challenges, opportunities, and potential solutions related to missingness in health care data for AI applications through the conduct of a digital conference and thematic analysis of conference proceedings.

METHODS

A digital conference was held in September 2022, attracting 861 registered participants, with 164 (19%) attending the live event. The conference featured presentations and panel discussions by experts in AI, machine learning, and health care. Transcripts of the event were analyzed using the stepwise framework of Braun and Clark to identify key themes related to missingness in health care data.

RESULTS

Three principal themes-data quality and bias, human input in model development, and trust and privacy-emerged from the analysis. Topics included the accuracy of predictive models, lack of inclusion of underrepresented communities, partnership with physicians and other populations, challenges with sensitive health care data, and fostering trust with patients and the health care community.

CONCLUSIONS

Addressing the challenges of data quality, human input, and trust is vital when devising and using machine learning algorithms in health care. Recommendations include expanding data collection efforts to reduce gaps and biases, involving medical professionals in the development and implementation of AI models, and developing clear ethical guidelines to safeguard patient privacy. Further research and ongoing discussions are needed to ensure these conclusions remain relevant as health care and AI continue to evolve.

Collapse

Hernandez-Boussard T, Siddique SM, Bierman AS, Hightower M, Burstin H. Promoting Equity In Clinical Decision Making: Dismantling Race-Based Medicine. Health Aff (Millwood) 2023;42:1369-1373. [PMID: 37782875 PMCID: PMC10849087 DOI: 10.1377/hlthaff.2023.00545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]

Abdulazeem H, Whitelaw S, Schauberger G, Klug SJ. A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data. PLoS One 2023;18:e0274276. [PMID: 37682909 PMCID: PMC10491005 DOI: 10.1371/journal.pone.0274276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 08/29/2023] [Indexed: 09/10/2023] Open

Abstract

With the advances in technology and data science, machine learning (ML) is being rapidly adopted by the health care sector. However, there is a lack of literature addressing the health conditions targeted by the ML prediction models within primary health care (PHC) to date. To fill this gap in knowledge, we conducted a systematic review following the PRISMA guidelines to identify health conditions targeted by ML in PHC. We searched the Cochrane Library, Web of Science, PubMed, Elsevier, BioRxiv, Association of Computing Machinery (ACM), and IEEE Xplore databases for studies published from January 1990 to January 2022. We included primary studies addressing ML diagnostic or prognostic predictive models that were supplied completely or partially by real-world PHC data. Studies selection, data extraction, and risk of bias assessment using the prediction model study risk of bias assessment tool were performed by two investigators. Health conditions were categorized according to international classification of diseases (ICD-10). Extracted data were analyzed quantitatively. We identified 106 studies investigating 42 health conditions. These studies included 207 ML prediction models supplied by the PHC data of 24.2 million participants from 19 countries. We found that 92.4% of the studies were retrospective and 77.3% of the studies reported diagnostic predictive ML models. A majority (76.4%) of all the studies were for models' development without conducting external validation. Risk of bias assessment revealed that 90.8% of the studies were of high or unclear risk of bias. The most frequently reported health conditions were diabetes mellitus (19.8%) and Alzheimer's disease (11.3%). Our study provides a summary on the presently available ML prediction models within PHC. We draw the attention of digital health policy makers, ML models developer, and health care professionals for more future interdisciplinary research collaboration in this regard.

Collapse

Busch F, Adams LC, Bressem KK. Biomedical Ethical Aspects Towards the Implementation of Artificial Intelligence in Medical Education. MEDICAL SCIENCE EDUCATOR 2023;33:1007-1012. [PMID: 37546190 PMCID: PMC10403458 DOI: 10.1007/s40670-023-01815-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 05/31/2023] [Indexed: 08/08/2023]

Kanda E, Suzuki A, Makino M, Tsubota H, Kanemata S, Shirakawa K, Yajima T. Machine learning models for prediction of HF and CKD development in early-stage type 2 diabetes patients. Sci Rep 2022;12:20012. [PMID: 36411366 PMCID: PMC9678863 DOI: 10.1038/s41598-022-24562-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/17/2022] [Indexed: 11/23/2022] Open

The AI life cycle: a holistic approach to creating ethical AI for health decisions. Nat Med 2022;28:2247-2249. [PMID: 36163298 PMCID: PMC9722334 DOI: 10.1038/s41591-022-01993-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

Lu J, Sattler A, Wang S, Khaki AR, Callahan A, Fleming S, Fong R, Ehlert B, Li RC, Shieh L, Ramchandran K, Gensheimer MF, Chobot S, Pfohl S, Li S, Shum K, Parikh N, Desai P, Seevaratnam B, Hanson M, Smith M, Xu Y, Gokhale A, Lin S, Pfeffer MA, Teuteberg W, Shah NH. Considerations in the reliability and fairness audits of predictive models for advance care planning. Front Digit Health 2022;4:943768. [PMID: 36339512 PMCID: PMC9634737 DOI: 10.3389/fdgth.2022.943768] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 08/17/2022] [Indexed: 11/30/2022] Open

Abstract

Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we conducted a reliability audit of two models based on model performance and calibration as well as a fairness audit based on summary statistics, subgroup performance and subgroup calibration. We assessed the Epic End-of-Life (EOL) Index model and an internally developed Stanford Hospital Medicine (HM) Advance Care Planning (ACP) model in 3 practice settings: Primary Care, Inpatient Oncology and Hospital Medicine, using clinicians' answers to the surprise question (“Would you be surprised if [patient X] passed away in [Y years]?”) as a surrogate outcome. For performance, the models had positive predictive value (PPV) at or above 0.76 in all settings. In Hospital Medicine and Inpatient Oncology, the Stanford HM ACP model had higher sensitivity (0.69, 0.89 respectively) than the EOL model (0.20, 0.27), and better calibration (O/E 1.5, 1.7) than the EOL model (O/E 2.5, 3.0). The Epic EOL model flagged fewer patients (11%, 21% respectively) than the Stanford HM ACP model (38%, 75%). There were no differences in performance and calibration by sex. Both models had lower sensitivity in Hispanic/Latino male patients with Race listed as “Other.” 10 clinicians were surveyed after a presentation summarizing the audit. 10/10 reported that summary statistics, overall performance, and subgroup performance would affect their decision to use the model to guide care; 9/10 said the same for overall and subgroup calibration. The most commonly identified barriers for routinely conducting such reliability and fairness audits were poor demographic data quality and lack of data access. This audit required 115 person-hours across 8–10 months. Our recommendations for performing reliability and fairness audits include verifying data validity, analyzing model performance on intersectional subgroups, and collecting clinician-patient linkages as necessary for label generation by clinicians. Those responsible for AI models should require such audits before model deployment and mediate between model auditors and impacted stakeholders.

Collapse

Affiliation(s)

Jonathan Lu Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States Correspondence: Jonathan Hsijing Lu
Amelia Sattler Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Samantha Wang Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Ali Raza Khaki Division of Oncology, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Alison Callahan Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Scott Fleming Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Rebecca Fong Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Benjamin Ehlert Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Ron C. Li Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Lisa Shieh Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Kavitha Ramchandran Division of Oncology, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Michael F. Gensheimer Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, United States
Sarah Chobot Inpatient Palliative Care, Stanford Health Care, Palo Alto, United States
Stephen Pfohl Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Siyun Li Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Kenny Shum Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
Nitin Parikh Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
Priya Desai Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
Briththa Seevaratnam Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Melanie Hanson Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Margaret Smith Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Yizhe Xu Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Arjun Gokhale Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Steven Lin Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Michael A. Pfeffer Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
Winifred Teuteberg Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Nigam H. Shah Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States Clinical Excellence Research Center, Stanford University School of Medicine, Palo Alto, United States

Collapse

Shahzad R, Ayub B, Siddiqui MAR. Quality of reporting of randomised controlled trials of artificial intelligence in healthcare: a systematic review. BMJ Open 2022;12:e061519. [PMID: 36691151 PMCID: PMC9445816 DOI: 10.1136/bmjopen-2022-061519] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 08/17/2022] [Indexed: 01/26/2023] Open

Lu JH, Callahan A, Patel BS, Morse KE, Dash D, Pfeffer MA, Shah NH. Assessment of Adherence to Reporting Guidelines by Commonly Used Clinical Prediction Models From a Single Vendor: A Systematic Review. JAMA Netw Open 2022;5:e2227779. [PMID: 35984654 PMCID: PMC9391954 DOI: 10.1001/jamanetworkopen.2022.27779] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 07/04/2022] [Indexed: 12/23/2022] Open

Abstract

Importance

Various model reporting guidelines have been proposed to ensure clinical prediction models are reliable and fair. However, no consensus exists about which model details are essential to report, and commonalities and differences among reporting guidelines have not been characterized. Furthermore, how well documentation of deployed models adheres to these guidelines has not been studied.

Objectives

To assess information requested by model reporting guidelines and whether the documentation for commonly used machine learning models developed by a single vendor provides the information requested.

Evidence Review

MEDLINE was queried using machine learning model card and reporting machine learning from November 4 to December 6, 2020. References were reviewed to find additional publications, and publications without specific reporting recommendations were excluded. Similar elements requested for reporting were merged into representative items. Four independent reviewers and 1 adjudicator assessed how often documentation for the most commonly used models developed by a single vendor reported the items.

Findings

From 15 model reporting guidelines, 220 unique items were identified that represented the collective reporting requirements. Although 12 items were commonly requested (requested by 10 or more guidelines), 77 items were requested by just 1 guideline. Documentation for 12 commonly used models from a single vendor reported a median of 39% (IQR, 37%-43%; range, 31%-47%) of items from the collective reporting requirements. Many of the commonly requested items had 100% reporting rates, including items concerning outcome definition, area under the receiver operating characteristics curve, internal validation, and intended clinical use. Several items reported half the time or less related to reliability, such as external validation, uncertainty measures, and strategy for handling missing data. Other frequently unreported items related to fairness (summary statistics and subgroup analyses, including for race and ethnicity or sex).

Conclusions and Relevance

These findings suggest that consistent reporting recommendations for clinical predictive models are needed for model developers to share necessary information for model deployment. The many published guidelines would, collectively, require reporting more than 200 items. Model documentation from 1 vendor reported the most commonly requested items from model reporting guidelines. However, areas for improvement were identified in reporting items related to model reliability and fairness. This analysis led to feedback to the vendor, which motivated updates to the documentation for future users.

Collapse

Golder S, O'Connor K, Wang Y, Stevens R, Gonzalez-Hernandez G. Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases. Annu Rev Biomed Data Sci 2022;5:251-267. [PMID: 35562851 DOI: 10.1146/annurev-biodatasci-122120-025806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Lossio-Ventura JA, Song W, Sainlaire M, Dykes PC, Hernandez-Boussard T. Opioid2MME: Standardizing opioid prescriptions to morphine milligram equivalents from electronic health records. Int J Med Inform 2022;162:104739. [PMID: 35325663 PMCID: PMC9477978 DOI: 10.1016/j.ijmedinf.2022.104739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 02/26/2022] [Accepted: 03/11/2022] [Indexed: 12/27/2022]

Abstract

BACKGROUND

The national increase in opioid use and misuse has become a public health crisis in the U.S. To tackle this crisis, the systematic evaluation and monitoring of opioid prescribing patterns is necessary. Thus, opioid prescriptions from electronic health records (EHRs) must be standardized to morphine milligram equivalent (MME) to facilitate monitoring and surveillance. While most studies report MMEs to describe opioid prescribing patterns, there is a lack of transparency regarding their data pre-processing and conversion processes for replication or comparison purposes.

METHODS

In this work, we developed Opioid2MME, a SQL-based open-source framework, to convert opioid prescriptions to MMEs using EHR prescription data. The MME conversions were validated internally using F-measures through manual chart review; were compared with two existing tools, as MedEx and MedXN; and the framework was tested in an external academic EHR system.

RESULTS

We identified 232,913 prescriptions for 49,060 unique patients in the EHRs, 2008-2019. We manually annotated a sample of prescriptions to assess the performance of the framework. The internal evaluation for medication information extraction achieved F-measures from 0.98 to 1.00 for each piece of the extracted information, outperforming MedEx and MedXN (F-Scores 0.98 and 0.94, respectively). MME values in the internal EHR system obtained a F-measure of 0.97 and identified 3% of the data as outliers and 7% missing values. The MME conversion in the external EHR system obtained 78.3% agreement between the MME values obtained with the development site.

CONCLUSIONS

The results demonstrated that the framework is replicable and capable of converting opioid prescriptions to MMEs across different medical institutions. In summary, this work sets the groundwork for the systematic evaluation and monitoring of opioid prescribing patterns across healthcare systems.

Collapse

Zhalechian M, Van Oyen MP, Lavieri MS, De Moraes CG, Girkin CA, Fazio MA, Weinreb RN, Bowd C, Liebmann JM, Zangwill LM, Andrews CA, Stein JD. Augmenting Kalman Filter Machine Learning Models with Data from OCT to Predict Future Visual Field Loss: An Analysis Using Data from the African Descent and Glaucoma Evaluation Study and the Diagnostic Innovation in Glaucoma Study. OPHTHALMOLOGY SCIENCE 2022;2:100097. [PMID: 36246178 PMCID: PMC9560647 DOI: 10.1016/j.xops.2021.100097] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 11/17/2021] [Accepted: 12/01/2021] [Indexed: 11/28/2022]

Abstract

Purpose

To assess whether the predictive accuracy of machine learning algorithms using Kalman filtering for forecasting future values of global indices on perimetry can be enhanced by adding global retinal nerve fiber layer (RNFL) data and whether model performance is influenced by the racial composition of the training and testing sets.

Design

Retrospective, longitudinal cohort study.

Participants

Patients with open-angle glaucoma (OAG) or glaucoma suspects enrolled in the African Descent and Glaucoma Evaluation Study or Diagnostic Innovation in Glaucoma Study.

Methods

We developed a Kalman filter (KF) with tonometry and perimetry data (KF-TP) and another KF with tonometry, perimetry, and global RNFL data (KF-TPO), comparing these models with one another and with 2 linear regression (LR) models for predicting mean deviation (MD) and pattern standard deviation values 36 months into the future for patients with OAG and glaucoma suspects. We also compared KF model performance when trained on individuals of European and African descent and tested on patients of the same versus the other race.

Main Outcome Measures

Predictive accuracy (percentage of MD values forecasted within the 95% repeatability interval) differences among the models.

Results

Among 362 eligible patients, the mean ± standard deviation age at baseline was 71.3 ± 10.4 years; 196 patients (54.1%) were women; 202 patients (55.8%) were of European descent, and 139 (38.4%) were of African descent. Among patients with OAG (n = 296), the predictive accuracy for 36 months in the future was higher for the KF models (73.5% for KF-TP, 71.2% for KF-TPO) than for the LR models (57.5%, 58.0%). Predictive accuracy did not differ significantly between KF-TP and KF-TPO (P = 0.20). If the races of the training and testing set patients were aligned (versus nonaligned), the mean absolute prediction error of future MD improved 0.39 dB for KF-TP and 0.48 dB for KF-TPO.

Conclusions

Adding global RNFL data to existing KFs minimally improved their predictive accuracy. Although KFs attained better predictive accuracy when the races of the training and testing sets were aligned, these improvements were modest. These findings will help to guide implementation of KFs in clinical practice.

Collapse

Affiliation(s)

Mohammad Zhalechian Department of Industrial and Operations Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
Mark P. Van Oyen Department of Industrial and Operations Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
Mariel S. Lavieri Department of Industrial and Operations Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
Carlos Gustavo De Moraes Bernard and Shirlee Brown Glaucoma Research Laboratory, Edward S. Harkness Eye Institute, Columbia University Irving Medical Center, New York, New York
Christopher A. Girkin Department of Ophthalmology, University of Alabama at Birmingham, Birmingham, Alabama
Massimo A. Fazio Department of Ophthalmology, University of Alabama at Birmingham, Birmingham, Alabama
Robert N. Weinreb Hamilton Glaucoma Center, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California, San Diego, La Jolla, California
Christopher Bowd Hamilton Glaucoma Center, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California, San Diego, La Jolla, California
Jeffrey M. Liebmann Bernard and Shirlee Brown Glaucoma Research Laboratory, Edward S. Harkness Eye Institute, Columbia University Irving Medical Center, New York, New York
Linda M. Zangwill Hamilton Glaucoma Center, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California, San Diego, La Jolla, California
Christopher A. Andrews Department of Ophthalmology and Visual Sciences, University of Michigan Medical School, Ann Arbor, Michigan Center for Eye Policy and Innovation, University of Michigan, Ann Arbor, Michigan
Joshua D. Stein Department of Ophthalmology and Visual Sciences, University of Michigan Medical School, Ann Arbor, Michigan Center for Eye Policy and Innovation, University of Michigan, Ann Arbor, Michigan Department of Health Management and Policy, University of Michigan School of Public Health, Ann Arbor, Michigan

Collapse

Huang J, Galal G, Etemadi M, Vaidyanathan M. Evaluation and Mitigation of Racial Bias in Clinical Machine Learning Models: A Scoping Review (Preprint). JMIR Med Inform 2022;10:e36388. [PMID: 35639450 PMCID: PMC9198828 DOI: 10.2196/36388] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/17/2022] [Accepted: 03/27/2022] [Indexed: 01/12/2023] Open

Abstract

Background

Racial bias is a key concern regarding the development, validation, and implementation of machine learning (ML) models in clinical settings. Despite the potential of bias to propagate health disparities, racial bias in clinical ML has yet to be thoroughly examined and best practices for bias mitigation remain unclear.

Objective

Our objective was to perform a scoping review to characterize the methods by which the racial bias of ML has been assessed and describe strategies that may be used to enhance algorithmic fairness in clinical ML.

Methods

A scoping review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Extension for Scoping Reviews. A literature search using PubMed, Scopus, and Embase databases, as well as Google Scholar, identified 635 records, of which 12 studies were included.

Results

Applications of ML were varied and involved diagnosis, outcome prediction, and clinical score prediction performed on data sets including images, diagnostic studies, clinical text, and clinical variables. Of the 12 studies, 1 (8%) described a model in routine clinical use, 2 (17%) examined prospectively validated clinical models, and the remaining 9 (75%) described internally validated models. In addition, 8 (67%) studies concluded that racial bias was present, 2 (17%) concluded that it was not, and 2 (17%) assessed the implementation of bias mitigation strategies without comparison to a baseline model. Fairness metrics used to assess algorithmic racial bias were inconsistent. The most commonly observed metrics were equal opportunity difference (5/12, 42%), accuracy (4/12, 25%), and disparate impact (2/12, 17%). All 8 (67%) studies that implemented methods for mitigation of racial bias successfully increased fairness, as measured by the authors’ chosen metrics. Preprocessing methods of bias mitigation were most commonly used across all studies that implemented them.

Conclusions

The broad scope of medical ML applications and potential patient harms demand an increased emphasis on evaluation and mitigation of racial bias in clinical ML. However, the adoption of algorithmic fairness principles in medicine remains inconsistent and is limited by poor data availability and ML model reporting. We recommend that researchers and journal editors emphasize standardized reporting and data availability in medical ML studies to improve transparency and facilitate evaluation for racial bias.

Collapse

Hernandez-Boussard T, Macklin P, Greenspan EJ, Gryshuk AL, Stahlberg E, Syeda-Mahmood T, Shmulevich I. Digital twins for predictive oncology will be a paradigm shift for precision cancer care. Nat Med 2021;27:2065-2066. [PMID: 34824458 PMCID: PMC9097784 DOI: 10.1038/s41591-021-01558-5] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Barboi C, Tzavelis A, Muhammad LN. Comparison of Severity of Illness Scores and Artificial Intelligence Models Predictive of Intensive Care Unit Mortality: Meta-analysis and review of the literature (Preprint). JMIR Med Inform 2021;10:e35293. [PMID: 35639445 PMCID: PMC9198821 DOI: 10.2196/35293] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 04/24/2022] [Accepted: 04/25/2022] [Indexed: 12/23/2022] Open

Shelmerdine SC, Arthurs OJ, Denniston A, Sebire NJ. Review of study reporting guidelines for clinical studies using artificial intelligence in healthcare. BMJ Health Care Inform 2021;28:bmjhci-2021-100385. [PMID: 34426417 PMCID: PMC8383863 DOI: 10.1136/bmjhci-2021-100385] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 08/09/2021] [Indexed: 02/07/2023] Open

Abstract

High-quality research is essential in guiding evidence-based care, and should be reported in a way that is reproducible, transparent and where appropriate, provide sufficient detail for inclusion in future meta-analyses. Reporting guidelines for various study designs have been widely used for clinical (and preclinical) studies, consisting of checklists with a minimum set of points for inclusion. With the recent rise in volume of research using artificial intelligence (AI), additional factors need to be evaluated, which do not neatly conform to traditional reporting guidelines (eg, details relating to technical algorithm development). In this review, reporting guidelines are highlighted to promote awareness of essential content required for studies evaluating AI interventions in healthcare. These include published and in progress extensions to well-known reporting guidelines such as Standard Protocol Items: Recommendations for Interventional Trials-AI (study protocols), Consolidated Standards of Reporting Trials-AI (randomised controlled trials), Standards for Reporting of Diagnostic Accuracy Studies-AI (diagnostic accuracy studies) and Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis-AI (prediction model studies). Additionally there are a number of guidelines that consider AI for health interventions more generally (eg, Checklist for Artificial Intelligence in Medical Imaging (CLAIM), minimum information (MI)-CLAIM, MI for Medical AI Reporting) or address a specific element such as the ‘learning curve’ (Developmental and Exploratory Clinical Investigation of Decision-AI). Economic evaluation of AI health interventions is not currently addressed, and may benefit from extension to an existing guideline. In the face of a rapid influx of studies of AI health interventions, reporting guidelines help ensure that investigators and those appraising studies consider both the well-recognised elements of good study design and reporting, while also adequately addressing new challenges posed by AI-specific elements.

Collapse