1
|
Kidwai-Khan F, Wang R, Skanderson M, Brandt CA, Fodeh S, Womack JA. A roadmap to artificial intelligence (AI): Methods for designing and building AI ready data to promote fairness. J Biomed Inform 2024; 154:104654. [PMID: 38740316 PMCID: PMC11144439 DOI: 10.1016/j.jbi.2024.104654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 05/01/2024] [Accepted: 05/10/2024] [Indexed: 05/16/2024]
Abstract
OBJECTIVES We evaluated methods for preparing electronic health record data to reduce bias before applying artificial intelligence (AI). METHODS We created methods for transforming raw data into a data framework for applying machine learning and natural language processing techniques for predicting falls and fractures. Strategies such as inclusion and reporting for multiple races, mixed data sources such as outpatient, inpatient, structured codes, and unstructured notes, and addressing missingness were applied to raw data to promote a reduction in bias. The raw data was carefully curated using validated definitions to create data variables such as age, race, gender, and healthcare utilization. For the formation of these variables, clinical, statistical, and data expertise were used. The research team included a variety of experts with diverse professional and demographic backgrounds to include diverse perspectives. RESULTS For the prediction of falls, information extracted from radiology reports was converted to a matrix for applying machine learning. The processing of the data resulted in an input of 5,377,673 reports to the machine learning algorithm, out of which 45,304 were flagged as positive and 5,332,369 as negative for falls. Processed data resulted in lower missingness and a better representation of race and diagnosis codes. For fractures, specialized algorithms extracted snippets of text around keywork "femoral" from dual x-ray absorptiometry (DXA) scans to identify femoral neck T-scores that are important for predicting fracture risk. The natural language processing algorithms yielded 98% accuracy and 2% error rate The methods to prepare data for input to artificial intelligence processes are reproducible and can be applied to other studies. CONCLUSION The life cycle of data from raw to analytic form includes data governance, cleaning, management, and analysis. When applying artificial intelligence methods, input data must be prepared optimally to reduce algorithmic bias, as biased output is harmful. Building AI-ready data frameworks that improve efficiency can contribute to transparency and reproducibility. The roadmap for the application of AI involves applying specialized techniques to input data, some of which are suggested here. This study highlights data curation aspects to be considered when preparing data for the application of artificial intelligence to reduce bias.
Collapse
Affiliation(s)
- Farah Kidwai-Khan
- Yale School of Medicine, New Haven, CT, USA; VA Connecticut Healthcare System, West Haven, CT, USA.
| | - Rixin Wang
- Yale School of Medicine, New Haven, CT, USA; VA Connecticut Healthcare System, West Haven, CT, USA
| | | | - Cynthia A Brandt
- Yale School of Medicine, New Haven, CT, USA; VA Connecticut Healthcare System, West Haven, CT, USA
| | - Samah Fodeh
- Yale School of Medicine, New Haven, CT, USA; VA Connecticut Healthcare System, West Haven, CT, USA
| | - Julie A Womack
- VA Connecticut Healthcare System, West Haven, CT, USA; Yale School of Nursing, New Haven, CT, USA
| |
Collapse
|
2
|
Qiao S, Li X, Olatosi B, Young SD. Utilizing Big Data analytics and electronic health record data in HIV prevention, treatment, and care research: a literature review. AIDS Care 2024; 36:583-603. [PMID: 34260325 DOI: 10.1080/09540121.2021.1948499] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 06/22/2021] [Indexed: 01/07/2023]
Abstract
Propelled by the transformative power of modern information and communication technologies, digitalization of data, and the increasing affordability of high-performance computing, Big Data science has brought forth revolutionary advancement in many areas of business, industry, health, and medicine. The HIV research and care service community is no exception to the benefits from the availability and utilization of Big Data analytics. Electronic health record (EHR) data (e.g., administrative and billing data, electronic medical records, or other digital records of information pertinent to individual or population health) are an essential source of health and disease outcome data because of the large amount of real-world, comprehensive, and often longitudinal data, which provide a good opportunity for leveraging advanced Big Data analytics in addressing challenges in HIV prevention, treatment, and care. This review focuses on studies that apply Big Data analytics to EHR data with aims to synthesize the HIV-related issues that EHR data studies can tackle, identify challenges in the utilization of EHR data in HIV research and practice, and discuss future needs and directions that can realize the promising potential role of Big Data in ending the HIV epidemic.
Collapse
Affiliation(s)
- Shan Qiao
- South Carolina SmartState Center for Healthcare Quality (CHQ), Columbia, SC, USA
- University of South Carolina Big Data Health Science Center, Columbia, SC, USA
- Department of Health Promotion, Education, and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Xiaoming Li
- South Carolina SmartState Center for Healthcare Quality (CHQ), Columbia, SC, USA
- University of South Carolina Big Data Health Science Center, Columbia, SC, USA
- Department of Health Promotion, Education, and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Bankole Olatosi
- South Carolina SmartState Center for Healthcare Quality (CHQ), Columbia, SC, USA
- University of South Carolina Big Data Health Science Center, Columbia, SC, USA
- Department of Health Services Policy and Management, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Sean D Young
- Department of Emergency Medicine, Department of Informatics, Institute for Prediction Technology, University of California, Irvine, CA, USA
| |
Collapse
|
3
|
Casey A, Davidson E, Grover C, Tobin R, Grivas A, Zhang H, Schrempf P, O’Neil AQ, Lee L, Walsh M, Pellie F, Ferguson K, Cvoro V, Wu H, Whalley H, Mair G, Whiteley W, Alex B. Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports. Front Digit Health 2023; 5:1184919. [PMID: 37840686 PMCID: PMC10569314 DOI: 10.3389/fdgth.2023.1184919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 09/06/2023] [Indexed: 10/17/2023] Open
Abstract
Background Natural language processing (NLP) has the potential to automate the reading of radiology reports, but there is a need to demonstrate that NLP methods are adaptable and reliable for use in real-world clinical applications. Methods We tested the F1 score, precision, and recall to compare NLP tools on a cohort from a study on delirium using images and radiology reports from NHS Fife and a population-based cohort (Generation Scotland) that spans multiple National Health Service health boards. We compared four off-the-shelf rule-based and neural NLP tools (namely, EdIE-R, ALARM+, ESPRESSO, and Sem-EHR) and reported on their performance for three cerebrovascular phenotypes, namely, ischaemic stroke, small vessel disease (SVD), and atrophy. Clinical experts from the EdIE-R team defined phenotypes using labelling techniques developed in the development of EdIE-R, in conjunction with an expert researcher who read underlying images. Results EdIE-R obtained the highest F1 score in both cohorts for ischaemic stroke, ≥93%, followed by ALARM+, ≥87%. The F1 score of ESPRESSO was ≥74%, whilst that of Sem-EHR is ≥66%, although ESPRESSO had the highest precision in both cohorts, 90% and 98%. For F1 scores for SVD, EdIE-R scored ≥98% and ALARM+ ≥90%. ESPRESSO scored lowest with ≥77% and Sem-EHR ≥81%. In NHS Fife, F1 scores for atrophy by EdIE-R and ALARM+ were 99%, dropping in Generation Scotland to 96% for EdIE-R and 91% for ALARM+. Sem-EHR performed lowest for atrophy at 89% in NHS Fife and 73% in Generation Scotland. When comparing NLP tool output with brain image reads using F1 scores, ALARM+ scored 80%, outperforming EdIE-R at 66% in ischaemic stroke. For SVD, EdIE-R performed best, scoring 84%, with Sem-EHR 82%. For atrophy, EdIE-R and both ALARM+ versions were comparable at 80%. Conclusions The four NLP tools show varying F1 (and precision/recall) scores across all three phenotypes, although more apparent for ischaemic stroke. If NLP tools are to be used in clinical settings, this cannot be performed "out of the box." It is essential to understand the context of their development to assess whether they are suitable for the task at hand or whether further training, re-training, or modification is required to adapt tools to the target task.
Collapse
Affiliation(s)
- Arlene Casey
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Emma Davidson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Claire Grover
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Richard Tobin
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Andreas Grivas
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Huayu Zhang
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Patrick Schrempf
- Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom
- School of Computer Science, University of St Andrews, St Andrews, United Kingdom
| | - Alison Q. O’Neil
- Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom
- School of Engineering, University of Edinburgh, Edinburgh, United Kingdom
| | - Liam Lee
- Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Michael Walsh
- Intensive Care Department, University Hospitals Bristol and Weston, Bristol, United Kingdom
| | - Freya Pellie
- National Horizons Centre, Teesside University, Darlington, United Kingdom
- School of Health and Life Sciences, Teesside University, Middlesbrough, United Kingdom
| | - Karen Ferguson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Vera Cvoro
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Department of Geriatric Medicine, NHS Fife, Fife, United Kingdom
| | - Honghan Wu
- Institute of Health Informatics, University College London, London, United Kingdom
- Alan Turing Institute, London, United Kingdom
| | - Heather Whalley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Generation Scotland, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| | - Grant Mair
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom
| | - Beatrice Alex
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, United Kingdom
- School of Literatures, Languages and Cultures, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
4
|
Mottin L, Goldman JP, Jäggli C, Achermann R, Gobeill J, Knafou J, Ehrsam J, Wicky A, Gérard CL, Schwenk T, Charrier M, Tsantoulis P, Lovis C, Leichtle A, Kiessling MK, Michielin O, Pradervand S, Foufi V, Ruch P. Multilingual RECIST classification of radiology reports using supervised learning. Front Digit Health 2023; 5:1195017. [PMID: 37388252 PMCID: PMC10303934 DOI: 10.3389/fdgth.2023.1195017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/05/2023] [Indexed: 07/01/2023] Open
Abstract
Objectives The objective of this study is the exploration of Artificial Intelligence and Natural Language Processing techniques to support the automatic assignment of the four Response Evaluation Criteria in Solid Tumors (RECIST) scales based on radiology reports. We also aim at evaluating how languages and institutional specificities of Swiss teaching hospitals are likely to affect the quality of the classification in French and German languages. Methods In our approach, 7 machine learning methods were evaluated to establish a strong baseline. Then, robust models were built, fine-tuned according to the language (French and German), and compared with the expert annotation. Results The best strategies yield average F1-scores of 90% and 86% respectively for the 2-classes (Progressive/Non-progressive) and the 4-classes (Progressive Disease, Stable Disease, Partial Response, Complete Response) RECIST classification tasks. Conclusions These results are competitive with the manual labeling as measured by Matthew's correlation coefficient and Cohen's Kappa (79% and 76%). On this basis, we confirm the capacity of specific models to generalize on new unseen data and we assess the impact of using Pre-trained Language Models (PLMs) on the accuracy of the classifiers.
Collapse
Affiliation(s)
- Luc Mottin
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Jean-Philippe Goldman
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Christoph Jäggli
- Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
| | - Rita Achermann
- Department of Radiology, Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Julien Gobeill
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Knafou
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Ehrsam
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alexandre Wicky
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Camille L. Gérard
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Tanja Schwenk
- Department of Oncology, Kantonsspital Aarau, Aarau, Switzerland
| | - Mélinda Charrier
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Petros Tsantoulis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alexander Leichtle
- Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
| | - Michael K. Kiessling
- Department of Medical Oncology and Hematology, University Hospital Zurich, Zurich, Switzerland
| | - Olivier Michielin
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Sylvain Pradervand
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Vasiliki Foufi
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Patrick Ruch
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| |
Collapse
|
5
|
Womack JA, Murphy TE, Leo-Summers L, Bates J, Jarad S, Gill TM, Hsieh E, Rodriguez-Barradas MC, Tien PC, Yin MT, Brandt CA, Justice AC. Assessing the contributions of modifiable risk factors to serious falls and fragility fractures among older persons living with HIV. J Am Geriatr Soc 2023; 71:1891-1901. [PMID: 36912153 PMCID: PMC10258163 DOI: 10.1111/jgs.18304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 01/14/2023] [Accepted: 01/25/2023] [Indexed: 03/14/2023]
Abstract
BACKGROUND Although 50 years represents middle age among uninfected individuals, studies have shown that persons living with HIV (PWH) begin to demonstrate elevated risk for serious falls and fragility fractures in the sixth decade; the proportions of these outcomes attributable to modifiable factors are unknown. METHODS We analyzed 21,041 older PWH on antiretroviral therapy (ART) from the Veterans Aging Cohort Study from 01/01/2010 through 09/30/2015. Serious falls were identified by Ecodes and a machine-learning algorithm applied to radiology reports. Fragility fractures (hip, vertebral, and upper arm) were identified using ICD9 codes. Predictors for both models included a serious fall within the past 12 months, body mass index, physiologic frailty (VACS Index 2.0), illicit substance and alcohol use disorders, and measures of multimorbidity and polypharmacy. We separately fit multivariable logistic models to each outcome using generalized estimating equations. From these models, the longitudinal extensions of average attributable fraction (LE-AAF) for modifiable risk factors were estimated. RESULTS Key risk factors for both outcomes included physiologic frailty (VACS Index 2.0) (serious falls [15%; 95% CI 14%-15%]; fractures [13%; 95% CI 12%-14%]), a serious fall in the past year (serious falls [7%; 95% CI 7%-7%]; fractures [5%; 95% CI 4%-5%]), polypharmacy (serious falls [5%; 95% CI 4%-5%]; fractures [5%; 95% CI 4%-5%]), an opioid prescription in the past month (serious falls [7%; 95% CI 6%-7%]; fractures [9%; 95% CI 8%-9%]), and diagnosis of alcohol use disorder (serious falls [4%; 95% CI 4%-5%]; fractures [8%; 95% CI 7%-8%]). CONCLUSIONS This study confirms the contributions of risk factors important in the general population to both serious falls and fragility fractures among older PWH. Successful prevention programs for these outcomes should build on existing prevention efforts while including risk factors specific to PWH.
Collapse
Affiliation(s)
- Julie A. Womack
- VA Connecticut Healthcare System, West Haven, CT
- Yale School of Nursing, West Haven, CT
| | | | | | - Jonathan Bates
- VA Connecticut Healthcare System, West Haven, CT
- Yale School of Medicine, New Haven, CT
| | | | | | - Evelyn Hsieh
- VA Connecticut Healthcare System, West Haven, CT
- Yale School of Medicine, New Haven, CT
| | - Maria C. Rodriguez-Barradas
- Infectious Diseases Section, Michael E DeBakey VA Medical Center, and Department of Medicine, Baylor College of Medicine, Houston, TX
| | - Phyllis C. Tien
- University of California, San Francisco, and Department of Veterans Affairs, San Francisco, CA
| | | | - Cynthia A. Brandt
- VA Connecticut Healthcare System, West Haven, CT
- Yale School of Medicine, New Haven, CT
| | - Amy C. Justice
- VA Connecticut Healthcare System, West Haven, CT
- Yale School of Medicine, New Haven, CT
| |
Collapse
|
6
|
Kidwai-Khan F, Wang R, Skanderson M, Brandt CA, Fodeh S, Womack JA. A Roadmap to Artificial Intelligence (AI): Methods for Designing and Building AI ready Data for Women's Health Studies. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.25.23290399. [PMID: 37398113 PMCID: PMC10312839 DOI: 10.1101/2023.05.25.23290399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Objectives Evaluating methods for building data frameworks for application of AI in large scale datasets for women's health studies. Methods We created methods for transforming raw data to a data framework for applying machine learning (ML) and natural language processing (NLP) techniques for predicting falls and fractures. Results Prediction of falls was higher in women compared to men. Information extracted from radiology reports was converted to a matrix for applying machine learning. For fractures, by applying specialized algorithms, we extracted snippets from dual x-ray absorptiometry (DXA) scans for meaningful terms usable for predicting fracture risk. Discussion Life cycle of data from raw to analytic form includes data governance, cleaning, management, and analysis. For applying AI, data must be prepared optimally to reduce algorithmic bias. Conclusion Algorithmic bias is harmful for research using AI methods. Building AI ready data frameworks that improve efficiency can be especially valuable for women's health.
Collapse
Affiliation(s)
- Farah Kidwai-Khan
- Yale School of Medicine, New Haven, Connecticut, USA
- VA Connecticut Healthcare System, West Haven, Connecticut, USA
| | - Rixin Wang
- Yale School of Medicine, New Haven, Connecticut, USA
- VA Connecticut Healthcare System, West Haven, Connecticut, USA
| | | | - Cynthia A. Brandt
- Yale School of Medicine, New Haven, Connecticut, USA
- VA Connecticut Healthcare System, West Haven, Connecticut, USA
| | - Samah Fodeh
- Yale School of Medicine, New Haven, Connecticut, USA
- VA Connecticut Healthcare System, West Haven, Connecticut, USA
| | - Julie A. Womack
- VA Connecticut Healthcare System, West Haven, Connecticut, USA
- Yale School of Nursing, New Haven, Connecticut, USA
| |
Collapse
|
7
|
Womack JA, Murphy TE, Leo-Summers L, Bates J, Jarad S, Smith AC, Gill TM, Hsieh E, Rodriguez-Barradas MC, Tien PC, Yin MT, Brandt CA, Justice AC. Predictive Risk Model for Serious Falls Among Older Persons Living With HIV. J Acquir Immune Defic Syndr 2022; 91:168-174. [PMID: 36094483 PMCID: PMC9470988 DOI: 10.1097/qai.0000000000003030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 04/26/2022] [Indexed: 11/26/2022]
Abstract
BACKGROUND Older (older than 50 years) persons living with HIV (PWH) are at elevated risk for falls. We explored how well our algorithm for predicting falls in a general population of middle-aged Veterans (age 45-65 years) worked among older PWH who use antiretroviral therapy (ART) and whether model fit improved with inclusion of specific ART classes. METHODS This analysis included 304,951 six-month person-intervals over a 15-year period (2001-2015) contributed by 26,373 older PWH from the Veterans Aging Cohort Study who were taking ART. Serious falls (those falls warranting a visit to a health care provider) were identified by external cause of injury codes and a machine-learning algorithm applied to radiology reports. Potential predictors included a fall within the past 12 months, demographics, body mass index, Veterans Aging Cohort Study Index 2.0 score, substance use, and measures of multimorbidity and polypharmacy. We assessed discrimination and calibration from application of the original coefficients (model derived from middle-aged Veterans) to older PWH and then reassessed by refitting the model using multivariable logistic regression with generalized estimating equations. We also explored whether model performance improved with indicators of ART classes. RESULTS With application of the original coefficients, discrimination was good (C-statistic 0.725; 95% CI: 0.719 to 0.730) but calibration was poor. After refitting the model, both discrimination (C-statistic 0.732; 95% CI: 0.727 to 0.734) and calibration were good. Including ART classes did not improve model performance. CONCLUSIONS After refitting their coefficients, the same variables predicted risk of serious falls among older PWH nearly and they had among middle-aged Veterans.
Collapse
Affiliation(s)
- Julie A Womack
- Veterans Affairs Connecticut Healthcare System, West Haven, CT
- Yale School of Nursing, West Haven, CT
| | | | | | - Jonathan Bates
- Veterans Affairs Connecticut Healthcare System, West Haven, CT
- Yale School of Medicine, New Haven, CT
| | | | | | | | - Evelyn Hsieh
- Veterans Affairs Connecticut Healthcare System, West Haven, CT
- Yale School of Medicine, New Haven, CT
| | - Maria C Rodriguez-Barradas
- Michael E DeBakey VA Medical Center, Infectious Diseases Section and Department of Medicine, Baylor College of Medicine, Houston, TX
| | - Phyllis C Tien
- University of California, San Francisco, CA
- Department of Veterans Affairs, San Francisco, CA
| | | | - Cynthia A Brandt
- Veterans Affairs Connecticut Healthcare System, West Haven, CT
- Yale University Schools of Medicine and Public Health, New Haven, CT
| | - Amy C Justice
- Veterans Affairs Connecticut Healthcare System, West Haven, CT
- Yale University Schools of Medicine and Public Health, New Haven, CT
| |
Collapse
|
8
|
Dipnall JF, Lu J, Gabbe BJ, Cosic F, Edwards E, Page R, Du L. Comparison of state-of-the-art machine and deep learning algorithms to classify proximal humeral fractures using radiology text. Eur J Radiol 2022; 153:110366. [DOI: 10.1016/j.ejrad.2022.110366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/08/2022] [Accepted: 05/16/2022] [Indexed: 12/01/2022]
|
9
|
Davidson EM, Poon MTC, Casey A, Grivas A, Duma D, Dong H, Suárez-Paniagua V, Grover C, Tobin R, Whalley H, Wu H, Alex B, Whiteley W. The reporting quality of natural language processing studies: systematic review of studies of radiology reports. BMC Med Imaging 2021; 21:142. [PMID: 34600486 PMCID: PMC8487512 DOI: 10.1186/s12880-021-00671-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 09/20/2021] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Automated language analysis of radiology reports using natural language processing (NLP) can provide valuable information on patients' health and disease. With its rapid development, NLP studies should have transparent methodology to allow comparison of approaches and reproducibility. This systematic review aims to summarise the characteristics and reporting quality of studies applying NLP to radiology reports. METHODS We searched Google Scholar for studies published in English that applied NLP to radiology reports of any imaging modality between January 2015 and October 2019. At least two reviewers independently performed screening and completed data extraction. We specified 15 criteria relating to data source, datasets, ground truth, outcomes, and reproducibility for quality assessment. The primary NLP performance measures were precision, recall and F1 score. RESULTS Of the 4,836 records retrieved, we included 164 studies that used NLP on radiology reports. The commonest clinical applications of NLP were disease information or classification (28%) and diagnostic surveillance (27.4%). Most studies used English radiology reports (86%). Reports from mixed imaging modalities were used in 28% of the studies. Oncology (24%) was the most frequent disease area. Most studies had dataset size > 200 (85.4%) but the proportion of studies that described their annotated, training, validation, and test set were 67.1%, 63.4%, 45.7%, and 67.7% respectively. About half of the studies reported precision (48.8%) and recall (53.7%). Few studies reported external validation performed (10.8%), data availability (8.5%) and code availability (9.1%). There was no pattern of performance associated with the overall reporting quality. CONCLUSIONS There is a range of potential clinical applications for NLP of radiology reports in health services and research. However, we found suboptimal reporting quality that precludes comparison, reproducibility, and replication. Our results support the need for development of reporting standards specific to clinical NLP studies.
Collapse
Affiliation(s)
- Emma M Davidson
- Centre for Clinical Brain Sciences, University of Edinburgh, Chancellor's Building, Little France, Edinburgh, EH16 4TJ, Scotland, UK.
| | - Michael T C Poon
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, Scotland, UK
- Brain Tumour Centre of Excellence, Cancer Research UK Edinburgh Centre, University of Edinburgh, Edinburgh, Scotland, UK
| | - Arlene Casey
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland, UK
| | - Andreas Grivas
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland, UK
| | - Daniel Duma
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland, UK
| | - Hang Dong
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, Scotland, UK
- Health Data Research UK, London, UK
| | - Víctor Suárez-Paniagua
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, Scotland, UK
- Health Data Research UK, London, UK
| | - Claire Grover
- Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh, Edinburgh, Scotland, UK
| | - Richard Tobin
- Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh, Edinburgh, Scotland, UK
| | - Heather Whalley
- Centre for Clinical Brain Sciences, University of Edinburgh, Chancellor's Building, Little France, Edinburgh, EH16 4TJ, Scotland, UK
- Division of Psychiatry, University of Edinburgh, Edinburgh, UK
| | - Honghan Wu
- Health Data Research UK, London, UK
- Institute of Health Informatics, University College London, London, UK
| | - Beatrice Alex
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland, UK
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, Scotland, UK
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Chancellor's Building, Little France, Edinburgh, EH16 4TJ, Scotland, UK
- Health Data Research UK, London, UK
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| |
Collapse
|
10
|
Womack JA, Murphy TE, Ramsey C, Bathulapalli H, Leo-Summers L, Smith AC, Bates J, Jarad S, Gill TM, Hsieh E, Rodriguez-Barradas MC, Tien PC, Yin MT, Brandt C, Justice AC. Brief Report: Are Serious Falls Associated With Subsequent Fragility Fractures Among Veterans Living With HIV? J Acquir Immune Defic Syndr 2021; 88:192-196. [PMID: 34506360 PMCID: PMC8513792 DOI: 10.1097/qai.0000000000002752] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 06/09/2021] [Indexed: 11/26/2022]
Abstract
BACKGROUND The extensive research on falls and fragility fractures among persons living with HIV (PWH) has not explored the association between serious falls and subsequent fragility fracture. We explored this association. SETTING Veterans Aging Cohort Study. METHODS This analysis included 304,951 6-month person- intervals over a 15-year period (2001-2015) contributed by 26,373 PWH who were 50+ years of age (mean age 55 years) and taking antiretroviral therapy (ART). Serious falls (those falls significant enough to result in a visit to a health care provider) were identified by the external cause of injury codes and a machine learning algorithm applied to radiology reports. Fragility fractures were identified using ICD9 codes and included hip fracture, vertebral fractures, and upper arm fracture and were modeled with multivariable logistic regression with generalized estimating equations. RESULTS After adjustment, serious falls in the previous year were associated with increased risk of fragility fracture [odds ratio (OR) 2.10; 95% confidence interval (CI): 1.83 to 2.41]. The use of integrase inhibitors was the only ART risk factor (OR 1.17; 95% CI: 1.03 to 1.33). Other risk factors included the diagnosis of alcohol use disorder (OR 1.49; 95% CI: 1.31 to 1.70) and having a prescription for an opioid in the previous 6 months (OR 1.40; 95% CI: 1.27 to 1.53). CONCLUSIONS Serious falls within the past year are strongly associated with fragility fractures among PWH on ART-largely a middle-aged population-much as they are among older adults in the general population.
Collapse
Affiliation(s)
- Julie A Womack
- VA Connecticut Healthcare System and Yale School of Nursing, West Haven, CT
| | | | - Christine Ramsey
- Yale School of Medicine, New Haven, CT
- VA Connecticut Healthcare System, West Haven, CT
| | - Harini Bathulapalli
- Yale School of Medicine, New Haven, CT
- VA Connecticut Healthcare System, West Haven, CT
| | | | | | - Jonathan Bates
- Yale School of Medicine, New Haven, CT
- VA Connecticut Healthcare System, West Haven, CT
| | | | | | - Evelyn Hsieh
- Yale School of Medicine, New Haven, CT
- VA Connecticut Healthcare System, West Haven, CT
| | - Maria C Rodriguez-Barradas
- Michael E DeBakey VA Medical Center, Infectious Diseases Section, Department of Medicine, Baylor College of Medicine, Houston, TX
| | - Phyllis C Tien
- Department of Veterans Affairs, University of California, San Francisco, San Francisco, CA
| | - Michael T Yin
- Columbia University Medical Center, New York, NY; and
| | - Cynthia Brandt
- VA Connecticut Healthcare System, West Haven, CT
- Yale University Schools of Medicine and Public Health, New Haven, CT
| | - Amy C Justice
- VA Connecticut Healthcare System, West Haven, CT
- Yale University Schools of Medicine and Public Health, New Haven, CT
| |
Collapse
|
11
|
Jing X. The Unified Medical Language System at 30 Years and How It Is Used and Published: Systematic Review and Content Analysis. JMIR Med Inform 2021; 9:e20675. [PMID: 34236337 PMCID: PMC8433943 DOI: 10.2196/20675] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 11/25/2020] [Accepted: 07/02/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The Unified Medical Language System (UMLS) has been a critical tool in biomedical and health informatics, and the year 2021 marks its 30th anniversary. The UMLS brings together many broadly used vocabularies and standards in the biomedical field to facilitate interoperability among different computer systems and applications. OBJECTIVE Despite its longevity, there is no comprehensive publication analysis of the use of the UMLS. Thus, this review and analysis is conducted to provide an overview of the UMLS and its use in English-language peer-reviewed publications, with the objective of providing a comprehensive understanding of how the UMLS has been used in English-language peer-reviewed publications over the last 30 years. METHODS PubMed, ACM Digital Library, and the Nursing & Allied Health Database were used to search for studies. The primary search strategy was as follows: UMLS was used as a Medical Subject Headings term or a keyword or appeared in the title or abstract. Only English-language publications were considered. The publications were screened first, then coded and categorized iteratively, following the grounded theory. The review process followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. RESULTS A total of 943 publications were included in the final analysis. Moreover, 32 publications were categorized into 2 categories; hence the total number of publications before duplicates are removed is 975. After analysis and categorization of the publications, UMLS was found to be used in the following emerging themes or areas (the number of publications and their respective percentages are given in parentheses): natural language processing (230/975, 23.6%), information retrieval (125/975, 12.8%), terminology study (90/975, 9.2%), ontology and modeling (80/975, 8.2%), medical subdomains (76/975, 7.8%), other language studies (53/975, 5.4%), artificial intelligence tools and applications (46/975, 4.7%), patient care (35/975, 3.6%), data mining and knowledge discovery (25/975, 2.6%), medical education (20/975, 2.1%), degree-related theses (13/975, 1.3%), digital library (5/975, 0.5%), and the UMLS itself (150/975, 15.4%), as well as the UMLS for other purposes (27/975, 2.8%). CONCLUSIONS The UMLS has been used successfully in patient care, medical education, digital libraries, and software development, as originally planned, as well as in degree-related theses, the building of artificial intelligence tools, data mining and knowledge discovery, foundational work in methodology, and middle layers that may lead to advanced products. Natural language processing, the UMLS itself, and information retrieval are the 3 most common themes that emerged among the included publications. The results, although largely related to academia, demonstrate that UMLS achieves its intended uses successfully, in addition to achieving uses broadly beyond its original intentions.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, United States
| |
Collapse
|
12
|
Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, Grover C, Suárez-Paniagua V, Tobin R, Whiteley W, Wu H, Alex B. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 2021; 21:179. [PMID: 34082729 PMCID: PMC8176715 DOI: 10.1186/s12911-021-01533-7] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports. METHODS We conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics. RESULTS We present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results. CONCLUSIONS Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.
Collapse
Affiliation(s)
- Arlene Casey
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland
| | - Emma Davidson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland
| | - Michael Poon
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland
| | - Hang Dong
- Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
- Health Data Research UK, London, UK
| | - Daniel Duma
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland
| | - Andreas Grivas
- Institute for Language, Cognition and Computation, School of informatics, University of Edinburgh, Edinburgh, Scotland
| | - Claire Grover
- Institute for Language, Cognition and Computation, School of informatics, University of Edinburgh, Edinburgh, Scotland
| | - Víctor Suárez-Paniagua
- Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
- Health Data Research UK, London, UK
| | - Richard Tobin
- Institute for Language, Cognition and Computation, School of informatics, University of Edinburgh, Edinburgh, Scotland
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Honghan Wu
- Health Data Research UK, London, UK
- Institute of Health Informatics, University College London, London, UK
| | - Beatrice Alex
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, Scotland
| |
Collapse
|
13
|
Simon ST, Mandair D, Tiwari P, Rosenberg MA. Prediction of Drug-Induced Long QT Syndrome Using Machine Learning Applied to Harmonized Electronic Health Record Data. J Cardiovasc Pharmacol Ther 2021; 26:335-340. [PMID: 33682475 DOI: 10.1177/1074248421995348] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
BACKGROUND Drug-induced QT prolongation is a potentially preventable cause of morbidity and mortality, however there are no widespread clinical tools utilized to predict which individuals are at greatest risk. Machine learning (ML) algorithms may provide a method for identifying these individuals, and could be automated to directly alert providers in real time. OBJECTIVE This study applies ML techniques to electronic health record (EHR) data to identify an integrated risk-prediction model that can be deployed to predict risk of drug-induced QT prolongation. METHODS We examined harmonized data from the UCHealth EHR and identified inpatients who had received a medication known to prolong the QT interval. Using a binary outcome of the development of a QTc interval >500 ms within 24 hours of medication initiation or no ECG with a QTc interval >500 ms, we compared multiple machine learning methods by classification accuracy and performed calibration and rescaling of the final model. RESULTS We identified 35,639 inpatients who received a known QT-prolonging medication and an ECG performed within 24 hours of administration. Of those, 4,558 patients developed a QTc > 500 ms and 31,081 patients did not. A deep neural network with random oversampling of controls was found to provide superior classification accuracy (F1 score 0.404; AUC 0.71) for the development of a long QT interval compared with other methods. The optimal cutpoint for prediction was determined and was reasonably accurate (sensitivity 71%; specificity 73%). CONCLUSIONS We found that deep neural networks applied to EHR data provide reasonable prediction of which individuals are most susceptible to drug-induced QT prolongation. Future studies are needed to validate this model in novel EHRs and within the physician order entry system to assess the ability to improve patient safety.
Collapse
Affiliation(s)
- Steven T Simon
- Division of Cardiology, 12225University of Colorado School of Medicine, Aurora, CO, USA
| | - Divneet Mandair
- Department of Medicine, 12225University of Colorado School of Medicine, Aurora, CO, USA
| | - Premanand Tiwari
- Colorado Center for Personalized Medicine, 12225University of Colorado School of Medicine, Aurora, CO, USA
| | - Michael A Rosenberg
- Division of Cardiology, 12225University of Colorado School of Medicine, Aurora, CO, USA.,Colorado Center for Personalized Medicine, 12225University of Colorado School of Medicine, Aurora, CO, USA
| |
Collapse
|
14
|
Oliveira CR, Niccolai P, Ortiz AM, Sheth SS, Shapiro ED, Niccolai LM, Brandt CA. Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study. JMIR Med Inform 2020; 8:e20826. [PMID: 32469840 PMCID: PMC7671846 DOI: 10.2196/20826] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 09/18/2020] [Accepted: 10/04/2020] [Indexed: 12/13/2022] Open
Abstract
Background Accurate identification of new diagnoses of human papillomavirus–associated cancers and precancers is an important step toward the development of strategies that optimize the use of human papillomavirus vaccines. The diagnosis of human papillomavirus cancers hinges on a histopathologic report, which is typically stored in electronic medical records as free-form, or unstructured, narrative text. Previous efforts to perform surveillance for human papillomavirus cancers have relied on the manual review of pathology reports to extract diagnostic information, a process that is both labor- and resource-intensive. Natural language processing can be used to automate the structuring and extraction of clinical data from unstructured narrative text in medical records and may provide a practical and effective method for identifying patients with vaccine-preventable human papillomavirus disease for surveillance and research. Objective This study's objective was to develop and assess the accuracy of a natural language processing algorithm for the identification of individuals with cancer or precancer of the cervix and anus. Methods A pipeline-based natural language processing algorithm was developed, which incorporated machine learning and rule-based methods to extract diagnostic elements from the narrative pathology reports. To test the algorithm’s classification accuracy, we used a split-validation study design. Full-length cervical and anal pathology reports were randomly selected from 4 clinical pathology laboratories. Two study team members, blinded to the classifications produced by the natural language processing algorithm, manually and independently reviewed all reports and classified them at the document level according to 2 domains (diagnosis and human papillomavirus testing results). Using the manual review as the gold standard, the algorithm’s performance was evaluated using standard measurements of accuracy, recall, precision, and F-measure. Results The natural language processing algorithm’s performance was validated on 949 pathology reports. The algorithm demonstrated accurate identification of abnormal cytology, histology, and positive human papillomavirus tests with accuracies greater than 0.91. Precision was lowest for anal histology reports (0.87, 95% CI 0.59-0.98) and highest for cervical cytology (0.98, 95% CI 0.95-0.99). The natural language processing algorithm missed 2 out of the 15 abnormal anal histology reports, which led to a relatively low recall (0.68, 95% CI 0.43-0.87). Conclusions This study outlines the development and validation of a freely available and easily implementable natural language processing algorithm that can automate the extraction and classification of clinical data from cervical and anal cytology and histology.
Collapse
Affiliation(s)
- Carlos R Oliveira
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States
| | - Patrick Niccolai
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States
| | - Anette Michelle Ortiz
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States
| | - Sangini S Sheth
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Yale University School of Medicine, New Haven, CT, United States
| | - Eugene D Shapiro
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States.,Departments of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, United States
| | - Linda M Niccolai
- Departments of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, United States
| | - Cynthia A Brandt
- Departments of Emergency Medicine, Biostatistics, and Health Informatics, Yale Schools of Medicine and Public Health, New Haven, CT, United States.,Veteran Affairs Connecticut Healthcare System, West Haven, CT, United States
| |
Collapse
|
15
|
Womack JA, Murphy TE, Bathulapalli H, Smith A, Bates J, Jarad S, Redeker NS, Luther SL, Gill TM, Brandt CA, Justice AC. Serious Falls in Middle-Aged Veterans: Development and Validation of a Predictive Risk Model. J Am Geriatr Soc 2020; 68:2847-2854. [PMID: 32860222 DOI: 10.1111/jgs.16773] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 07/13/2020] [Accepted: 07/14/2020] [Indexed: 12/27/2022]
Abstract
BACKGROUND/OBJECTIVES Due to high rates of multimorbidity, polypharmacy, and hazardous alcohol and opioid use, middle-aged Veterans are at risk for serious falls (those prompting a visit with a healthcare provider), posing significant risk to their forthcoming geriatric health and quality of life. We developed and validated a predictive model of the 6-month risk of serious falls among middle-aged Veterans. DESIGN Cohort study. SETTING Veterans Health Administration (VA). PARTICIPANTS Veterans, aged 45 to 65 years, who presented for care within the VA between 2012 and 2015 (N = 275,940). EXPOSURES The exposures of primary interest were substance use (including alcohol and prescription opioid use), multimorbidity, and polypharmacy. Hazardous alcohol use was defined as an Alcohol Use Disorders Identification Test - Consumption (AUDIT-C) score of 3 or greater for women and 4 or greater for men. We used International Classification of Diseases, Ninth Revision (ICD-9), codes to identify alcohol and illicit substance use disorders and identified prescription opioid use from pharmacy fill-refill data. We included counts of chronic medications and of physical and mental health comorbidities. MEASUREMENTS We identified serious falls using external cause of injury codes and a machine-learning algorithm that identified serious falls in radiology reports. We used multivariable logistic regression with general estimating equations to calculate risk. We used an integrated predictiveness curve to identify intervention thresholds. RESULTS Most of our sample (54%) was aged 60 years or younger. Duration of follow-up was up to 4 years. Veterans who fell were more likely to be female (11% vs 7%) and White (72% vs 68%). They experienced 43,641 serious falls during follow-up. We identified 16 key predictors of serious falls and five interaction terms. Model performance was enhanced by addition of opioid use, as evidenced by overall category-free net reclassification improvement of 0.32 (P < .001). Discrimination (C-statistic = 0.76) and calibration were excellent for both development and validation data sets. CONCLUSION We developed and internally validated a model to predict 6-month risk of serious falls among middle-aged Veterans with excellent discrimination and calibration.
Collapse
Affiliation(s)
- Julie A Womack
- VA Connecticut Healthcare System, West Haven.,Yale School of Nursing, Orange, Connecticut
| | - Terrence E Murphy
- Geriatrics Section, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
| | - Harini Bathulapalli
- VA Connecticut Healthcare System, West Haven.,Yale Center for Analytic Services, Yale School of Medicine, New Haven, Connecticut
| | | | - Jonathan Bates
- VA Connecticut Healthcare System, West Haven.,Yale Center for Medical Informatics, Yale School of Medicine, New Haven, Connecticut
| | - Samah Jarad
- Yale Center for Medical Informatics, Yale School of Medicine, New Haven, Connecticut
| | | | - Stephen L Luther
- James A. Haley Veterans Hospital, Research Service, Tampa, Florida.,University of South Florida, College of Public Health, Tampa, Florida
| | - Thomas M Gill
- Geriatrics Section, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
| | - Cynthia A Brandt
- VA Connecticut Healthcare System, West Haven.,Yale Center for Medical Informatics, Yale School of Medicine, New Haven, Connecticut
| | - Amy C Justice
- VA Connecticut Healthcare System, West Haven.,Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
| |
Collapse
|
16
|
Polypharmacy, Hazardous Alcohol and Illicit Substance Use, and Serious Falls Among PLWH and Uninfected Comparators. J Acquir Immune Defic Syndr 2020; 82:305-313. [PMID: 31339866 DOI: 10.1097/qai.0000000000002130] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
BACKGROUND Medication classes, polypharmacy, and hazardous alcohol and illicit substance abuse may exhibit stronger associations with serious falls among persons living with HIV (PLWH) than with uninfected comparators. We investigated whether these associations differed by HIV status. SETTING Veterans Aging Cohort Study. METHODS We used a nested case-control design. Cases (N = 13,530) were those who fell. Falls were identified by external cause of injury codes and a machine-learning algorithm applied to radiology reports. Cases were matched to controls (N = 67,060) by age, race, sex, HIV status, duration of observation, and baseline date. Risk factors included medication classes, count of unique non-antiretroviral therapy (non-ART) medications, and hazardous alcohol and illicit substance use. We used unconditional logistic regression to evaluate associations. RESULTS Among PLWH, benzodiazepines [odds ratio (OR) 1.24; 95% confidence interval (CI) 1.08 to 1.40] and muscle relaxants (OR 1.29; 95% CI: 1.08 to 1.46) were associated with serious falls but not among uninfected (P > 0.05). In both groups, key risk factors included non-ART medications (per 5 medications) (OR 1.20, 95% CI: 1.17 to 1.23), illicit substance use/abuse (OR 1.44; 95% CI: 1.34 to 1.55), hazardous alcohol use (OR 1.30; 95% CI: 1.23 to 1.37), and an opioid prescription (OR 1.35; 95% CI: 1.29 to 1.41). CONCLUSION Benzodiazepines and muscle relaxants were associated with serious falls among PLWH. Non-ART medication count, hazardous alcohol and illicit substance use, and opioid prescriptions were associated with serious falls in both groups. Prevention of serious falls should focus on reducing specific classes and absolute number of medications and both alcohol and illicit substance use.
Collapse
|
17
|
Spasic I, Nenadic G. Clinical Text Data in Machine Learning: Systematic Review. JMIR Med Inform 2020; 8:e17984. [PMID: 32229465 PMCID: PMC7157505 DOI: 10.2196/17984] [Citation(s) in RCA: 111] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 02/24/2020] [Accepted: 02/24/2020] [Indexed: 12/22/2022] Open
Abstract
Background Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics. Results The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance. Conclusions We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.
Collapse
Affiliation(s)
- Irena Spasic
- School of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
18
|
Song X, Waitman LR, Hu Y, Yu ASL, Robins D, Liu M. Robust clinical marker identification for diabetic kidney disease with ensemble feature selection. J Am Med Inform Assoc 2019; 26:242-253. [PMID: 30602020 PMCID: PMC7792755 DOI: 10.1093/jamia/ocy165] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2018] [Revised: 11/05/2018] [Accepted: 11/21/2018] [Indexed: 11/15/2022] Open
Abstract
Objective Diabetic kidney disease (DKD) is one of the most frequent complications in diabetes associated with substantial morbidity and mortality. To accelerate DKD risk factor discovery, we present an ensemble feature selection approach to identify a robust set of discriminant factors using electronic medical records (EMRs). Material and Methods We identified a retrospective cohort of 15 645 adult patients with type 2 diabetes, excluding those with pre-existing kidney disease, and utilized all available clinical data types in modeling. We compared 3 machine-learning-based embedded feature selection methods in conjunction with 6 feature ensemble techniques for selecting top-ranked features in terms of robustness to data perturbations and predictability for DKD onset. Results The gradient boosting machine (GBM) with weighted mean rank feature ensemble technique achieved the best performance with an AUC of 0.82 [95%-CI, 0.81-0.83] on internal validation and 0.71 [95%-CI, 0.68-0.73] on external temporal validation. The ensemble model identified a set of 440 features from 84 872 unique clinical features that are both predicative of DKD onset and robust against data perturbations, including 191 labs, 51 visit details (mainly vital signs), 39 medications, 34 orders, 30 diagnoses, and 95 other clinical features. Discussion Many of the top-ranked features have not been included in the state-of-art DKD prediction models, but their relationships with kidney function have been suggested in existing literature. Conclusion Our ensemble feature selection framework provides an option for identifying a robust and parsimonious feature set unbiasedly from EMR data, which effectively aids in knowledge discovery for DKD risk factors.
Collapse
Affiliation(s)
- Xing Song
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Lemuel R Waitman
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Yong Hu
- Big Data Decision Institute, Jinan University, Guangzhou, PRC
| | - Alan S L Yu
- Division of Nephrology and Hypertension and the Kidney Institute, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - David Robins
- Diabetes Institute, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Mei Liu
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
19
|
Feller DJ, Zucker J, Don't Walk OB, Srikishan B, Martinez R, Evans H, Yin MT, Gordon P, Elhadad N. Towards the Inference of Social and Behavioral Determinants of Sexual Health: Development of a Gold-Standard Corpus with Semi-Supervised Learning. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:422-429. [PMID: 30815082 PMCID: PMC6371339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Social and behavioral determinants of health (SBDH) are environmental and behavioral factors that are increasingly recognized for their impact on health outcomes. We describe ongoing research to extract SBDH related to sexual health from clinical documentation. Our work addresses several challenges. First, there is no standard set of SBDHs for sexual health; we describe our curation of 38 such SBDHs. Second, it is unknown how SBDH related to sexual health are expressed in clinical notes; we detail the characteristics of an annotated corpus. Third, SBDH documentations are rare; we describe the use of semi-supervised learning to accelerate the annotation process by identifying notes likely to document SBDH. Fourth, we describe preliminary results to infer an array of SBDH from clinical documentation using supervised learning.
Collapse
|
20
|
Zhao Y, Wong ZSY, Tsui KL. A Framework of Rebalancing Imbalanced Healthcare Data for Rare Events' Classification: A Case of Look-Alike Sound-Alike Mix-Up Incident Detection. JOURNAL OF HEALTHCARE ENGINEERING 2018; 2018:6275435. [PMID: 29951182 PMCID: PMC5987310 DOI: 10.1155/2018/6275435] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 02/02/2018] [Accepted: 02/22/2018] [Indexed: 11/17/2022]
Abstract
Identifying rare but significant healthcare events in massive unstructured datasets has become a common task in healthcare data analytics. However, imbalanced class distribution in many practical datasets greatly hampers the detection of rare events, as most classification methods implicitly assume an equal occurrence of classes and are designed to maximize the overall classification accuracy. In this study, we develop a framework for learning healthcare data with imbalanced distribution via incorporating different rebalancing strategies. The evaluation results showed that the developed framework can significantly improve the detection accuracy of medical incidents due to look-alike sound-alike (LASA) mix-ups. Specifically, logistic regression combined with the synthetic minority oversampling technique (SMOTE) produces the best detection results, with a significant 45.3% increase in recall (recall = 75.7%) compared with pure logistic regression (recall = 52.1%).
Collapse
Affiliation(s)
- Yang Zhao
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong
| | - Zoie Shui-Yee Wong
- Graduate School of Public Health, St. Luke's International University, Tokyo, Japan
| | - Kwok Leung Tsui
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong
| |
Collapse
|
21
|
Gehrmann S, Dernoncourt F, Li Y, Carlson ET, Wu JT, Welt J, Foote J, Moseley ET, Grant DW, Tyler PD, Celi LA. Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives. PLoS One 2018; 13:e0192360. [PMID: 29447188 PMCID: PMC5813927 DOI: 10.1371/journal.pone.0192360] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 01/21/2018] [Indexed: 01/22/2023] Open
Abstract
In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.
Collapse
Affiliation(s)
- Sebastian Gehrmann
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Harvard SEAS, Harvard University, Cambridge, MA, United States of America
- * E-mail:
| | - Franck Dernoncourt
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Massachusetts Institute of Technology, Cambridge, MA, United States of America
- Adobe Research, San Jose, CA, United States of America
| | - Yeran Li
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Harvard T.H. Chan School of Public Health, Cambridge, MA, United States of America
| | - Eric T. Carlson
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Philips Research North America, Cambridge, MA, United States of America
| | - Joy T. Wu
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Harvard T.H. Chan School of Public Health, Cambridge, MA, United States of America
| | - Jonathan Welt
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Wellman Center for Photomedicine, Massachusetts General Hospital, Boston, MA, United States of America
| | - John Foote
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Tufts University School of Medicine, Cambridge, MA, United States of America
| | - Edward T. Moseley
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- College of Science and Mathematics, University of Massachusetts, Boston, MA, United States of America
| | - David W. Grant
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Washington University School of Medicine, St. Louis, MO, United States of America
| | - Patrick D. Tyler
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Department of Internal Medicine, Beth Israel Deaconess Medical Center, Boston, MA, United States of America
| | - Leo A. Celi
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Massachusetts Institute of Technology, Cambridge, MA, United States of America
| |
Collapse
|
22
|
Kennell TI, Willig JH, Cimino JJ. Clinical Informatics Researcher's Desiderata for the Data Content of the Next Generation Electronic Health Record. Appl Clin Inform 2017; 8:1159-1172. [PMID: 29270955 DOI: 10.4338/aci-2017-06-r-0101] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE Clinical informatics researchers depend on the availability of high-quality data from the electronic health record (EHR) to design and implement new methods and systems for clinical practice and research. However, these data are frequently unavailable or present in a format that requires substantial revision. This article reports the results of a review of informatics literature published from 2010 to 2016 that addresses these issues by identifying categories of data content that might be included or revised in the EHR. MATERIALS AND METHODS We used an iterative review process on 1,215 biomedical informatics research articles. We placed them into generic categories, reviewed and refined the categories, and then assigned additional articles, for a total of three iterations. RESULTS Our process identified eight categories of data content issues: Adverse Events, Clinician Cognitive Processes, Data Standards Creation and Data Communication, Genomics, Medication List Data Capture, Patient Preferences, Patient-reported Data, and Phenotyping. DISCUSSION These categories summarize discussions in biomedical informatics literature that concern data content issues restricting clinical informatics research. These barriers to research result from data that are either absent from the EHR or are inadequate (e.g., in narrative text form) for the downstream applications of the data. In light of these categories, we discuss changes to EHR data storage that should be considered in the redesign of EHRs, to promote continued innovation in clinical informatics. CONCLUSION Based on published literature of clinical informaticians' reuse of EHR data, we characterize eight types of data content that, if included in the next generation of EHRs, would find immediate application in advanced informatics tools and techniques.
Collapse
Affiliation(s)
- Timothy I Kennell
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James H Willig
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| |
Collapse
|
23
|
Greene M, Justice AC, Covinsky KE. Assessment of geriatric syndromes and physical function in people living with HIV. Virulence 2016; 8:586-598. [PMID: 27715455 DOI: 10.1080/21505594.2016.1245269] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
As the number of older adults living with HIV continues to increase, understanding how to incorporate geriatric assessments within HIV care will be critical. Assessment of geriatric syndromes and physical function can be useful tools for HIV clinicians and researchers to help identify the most vulnerable older adults and to better understand the aging process in people living with HIV (PLWH). This review focuses on the assessment of falls, frailty, and physical function, first in the general population of older adults, and includes a specific focus on use of these assessments in older adults living with HIV.
Collapse
Affiliation(s)
- Meredith Greene
- a Division of Geriatrics, Department of Medicine , University of California San Francisco , San Francisco , CA , USA
| | - Amy C Justice
- b Veterans Affairs Connecticut Healthcare System , West Haven , CT , USA.,c Yale University Schools of Medicine and Public Health , New Haven , CT , USA
| | - Kenneth E Covinsky
- a Division of Geriatrics, Department of Medicine , University of California San Francisco , San Francisco , CA , USA.,d Section of Geriatrics and Palliative Medicine , San Francisco Veterans Affairs Medical Center , San Francisco , CA , USA
| |
Collapse
|