1
|
Yonamine S, Ma CJ, Alabi RO, Kaidonis G, Chan L, Borkar D, Stein JD, Arnold BF, Sun CQ. Comparison of Diagnosis Codes to Clinical Notes in Classifying Patients with Diabetic Retinopathy. OPHTHALMOLOGY SCIENCE 2024; 4:100564. [PMID: 39253554 PMCID: PMC11382306 DOI: 10.1016/j.xops.2024.100564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/31/2024] [Accepted: 06/10/2024] [Indexed: 09/11/2024]
Abstract
Purpose Electronic health records (EHRs) contain a vast amount of clinical data. Improved automated classification approaches have the potential to accurately and efficiently identify patient cohorts for research. We evaluated if a rule-based natural language processing (NLP) algorithm using clinical notes performed better for classifying proliferative diabetic retinopathy (PDR) and nonproliferative diabetic retinopathy (NPDR) severity compared with International Classification of Diseases, ninth edition (ICD-9) or 10th edition (ICD-10) codes. Design Cross-sectional study. Subjects Deidentified EHR data from an academic medical center identified 2366 patients aged ≥18 years, with diabetes mellitus, diabetic retinopathy (DR), and available clinical notes. Methods From these 2366 patients, 306 random patients (100 training set, 206 test set) underwent chart review by ophthalmologists to establish the gold standard. International Classification of Diseases codes were extracted from the EHR. The notes algorithm identified positive mention of PDR and NPDR severity from clinical notes. Proliferative diabetic retinopathy and NPDR severity classification by ICD codes and the notes algorithm were compared with the gold standard. The entire DR cohort (N = 2366) was then classified as having presence (or absence) of PDR using ICD codes and the notes algorithm. Main Outcome Measures Sensitivity, specificity, positive predictive value (PPV), negative predictive value, and F1 score for the notes algorithm compared with ICD codes using a gold standard of chart review. Results For PDR classification of the test set patients, the notes algorithm performed better than ICD codes for all metrics. Specifically, the notes algorithm had significantly higher sensitivity (90.5% [95% confidence interval 85.7, 94.9] vs. 68.4% [60.4, 75.3]), but similar PPV (98.0% [95.4-100] vs. 94.7% [90.3, 98.3]) respectively. The F1 score was 0.941 [0.910, 0.966] for the notes algorithm compared with 0.794 [0.734, 0.842] for ICD codes. For PDR classification, ICD-10 codes performed better than ICD-9 codes (F1 score 0.836 [0.771, 0.878] vs. 0.596 [0.222, 0.692]). For NPDR severity classification, the notes algorithm performed similarly to ICD codes, but performance was limited by small sample size. Conclusions The notes algorithm outperformed ICD codes for PDR classification. The findings demonstrate the significant potential of applying a rule-based NLP algorithm to clinical notes to increase the efficiency and accuracy of cohort selection for research. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Sean Yonamine
- Department of Ophthalmology, University of California, San Francisco, California
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland
| | - Chu Jian Ma
- Department of Ophthalmology, University of California, San Francisco, California
| | - Rolake O Alabi
- Department of Ophthalmology, University of California, San Francisco, California
| | - Georgia Kaidonis
- Department of Ophthalmology, University of California, San Francisco, California
| | - Lawrence Chan
- Department of Ophthalmology, University of California, San Francisco, California
| | - Durga Borkar
- Department of Ophthalmology, Duke University, Durham, North Carolina
| | - Joshua D Stein
- Department of Ophthalmology and Visual Sciences, University of Michigan, Ann Arbor, Michigan
| | - Benjamin F Arnold
- Department of Ophthalmology, University of California, San Francisco, California
- F.I. Proctor Foundation, University of California, San Francisco, California
- Institute for Global Health Sciences, University of California, San Francisco, California
| | - Catherine Q Sun
- Department of Ophthalmology, University of California, San Francisco, California
- F.I. Proctor Foundation, University of California, San Francisco, California
| |
Collapse
|
2
|
Sinha S, Williams SC, Hanrahan JG, Muirhead WR, Booker J, Khalil S, Kitchen N, Newall N, Obholzer R, Saeed SR, Marcus HJ, Grover P. Mapping the Clinical Pathway for Patients Undergoing Vestibular Schwannoma Resection. World Neurosurg 2024:S1878-8750(24)01297-X. [PMID: 39074584 DOI: 10.1016/j.wneu.2024.07.157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 07/20/2024] [Accepted: 07/22/2024] [Indexed: 07/31/2024]
Abstract
BACKGROUND The introduction of the electronic health record (EHR) has improved the collection and storage of patient information, enhancing clinical communication and academic research. However, EHRs are limited by data quality and the time-consuming task of manual data extraction. This study aimed to use process mapping to help identify critical data entry points within the clinical pathway for patients with vestibular schwannoma (VS) ideal for structured data entry and automated data collection to improve patient care and research. METHODS A 2-stage methodology was used at a neurosurgical unit. Process maps were developed using semi-structured interviews with stakeholders in the management of VS resection. Process maps were then retrospectively validated against EHRs for patients admitted between August 2019 and December 2021, establishing critical data entry points. RESULTS In the process map development, 20 stakeholders were interviewed. Process maps were validated against EHRs of 36 patients admitted for VS resection. Operative notes, surgical inpatient reviews (including ward rounds), and discharge summaries were available for all patients, representing critical data entry points. Areas for documentation improvement were in the preoperative clinics (30/36; 83.3%), preoperative skull base multidisciplinary team (32/36; 88.9%), postoperative follow-up clinics (32/36; 88.9%), and postoperative skull base multidisciplinary team meeting (29/36; 80.6%). CONCLUSIONS This is a first use to our knowledge of a 2-stage methodology for process mapping the clinical pathway for patients undergoing VS resection. We identified critical data entry points that can be targeted for structured data entry and for automated data collection tools, positively impacting patient care and research.
Collapse
Affiliation(s)
- Siddharth Sinha
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom; Wellcome / EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom; Francis Crick Institute, London, United Kingdom.
| | - Simon C Williams
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom; Wellcome / EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - John Gerrard Hanrahan
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom; Wellcome / EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - William R Muirhead
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom; Wellcome / EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom; Francis Crick Institute, London, United Kingdom
| | - James Booker
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom; Wellcome / EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Sherif Khalil
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom; Royal National Throat Nose and Ear Hospital, London, United Kingdom
| | - Neil Kitchen
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Nicola Newall
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom; Wellcome / EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Rupert Obholzer
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom; Royal National Throat Nose and Ear Hospital, London, United Kingdom
| | - Shakeel R Saeed
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom; Royal National Throat Nose and Ear Hospital, London, United Kingdom
| | - Hani J Marcus
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom; Wellcome / EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Patrick Grover
- Division of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| |
Collapse
|
3
|
Sun D, Basi J, Kreinbrook J, Mhaskar R, Leonelli F. Reliability of Electronic Health Records in Recording Veterans' Tobacco Use Status. Mil Med 2024; 189:e509-e514. [PMID: 37506175 DOI: 10.1093/milmed/usad290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 07/06/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
INTRODUCTION The prevalence of tobacco use in the Veteran population and among Veterans Health Administration patients remains high, resulting in significant health and economic consequences. This problem has generated many tobacco research studies and clinical interventions, which often rely upon tobacco use status data previously recorded in electronic health records (EHR). Therefore, the consistency and reliability of these data are critical. The Veterans Health Administration uses an extensive EHR system where tobacco use status can be documented either as free text (FT) or as health factors (HF). The current literature assessing the reliability of HF and FT data is limited. This analysis evaluated the agreement between HF and FT tobacco use status data. MATERIALS AND METHODS This retrospective study included Veterans who underwent coronary revascularization and had tobacco use statuses recorded as both HF and FT. These statuses were categorized as "Current," "Former," or "Never." The closest recorded status to the index date (date of revascularization procedure) for each subject in both datasets was chosen, and Cohen's kappa statistic was calculated to measure the agreement between HF and FT. Implausible tobacco use status changes within each dataset were quantified to assess trustworthiness. Agreement between HF and FT data was first measured for all subjects (n = 1,095), which included those who had implausible status changes in either dataset and then measured again for subjects (n = 770) without any implausible status changes in either dataset. This study was exempt from institutional review board review. RESULTS Overall, 14.3% and 17.7% of all subjects had implausible tobacco use status changes in HF and FT data, respectively. For all subjects (n = 1,095), including those with implausible data, there was "moderate" agreement between HF and FT data (kappa = 0.49; 95% CI, 0.44-0.53). For subjects without implausible data (n = 770), the strength of agreement between HF and FT data was "good" (kappa = 0.64; 95% CI, 0.59-0.69). CONCLUSIONS Agreement between HF and FT data that document the tobacco use statuses of Veterans varied because of implausible data. HF data had fewer implausible tobacco use statuses, but FT data were recorded more frequently. Although HF and FT data can be reasonably relied upon to determine the tobacco use statuses of Veterans, researchers and clinicians must be aware of implausible data and consider methods to overcome this limitation. Future studies should investigate the ways of improving the consistency of EHR documentation by health care providers and benchmark HF and FT data against a gold standard like biochemical verification to determine accuracy.
Collapse
Affiliation(s)
- Daniel Sun
- Tampa VA Clinical Research and Education Center, James A. Haley Veterans' Hospital, Tampa, FL 33612, USA
- College of Public Health, University of South Florida, Tampa, FL 33612, USA
| | - Joseph Basi
- Tampa VA Clinical Research and Education Center, James A. Haley Veterans' Hospital, Tampa, FL 33612, USA
- Georgetown University School of Medicine, Georgetown University, Washington, DC 20057, USA
| | - Judah Kreinbrook
- Tampa VA Clinical Research and Education Center, James A. Haley Veterans' Hospital, Tampa, FL 33612, USA
- Duke University School of Medicine, Duke University, Durham, NC 27710, USA
| | - Rahul Mhaskar
- Office of Research, Morsani College of Medicine, University of South Florida, Tampa, FL 33602, USA
| | - Fabio Leonelli
- Tampa VA Clinical Research and Education Center, James A. Haley Veterans' Hospital, Tampa, FL 33612, USA
- Cardiology Department, James A. Haley Veterans' Hospital, University of South Florida, Tampa, FL 33612, USA
| |
Collapse
|
4
|
Valencia Morales DJ, Bansal V, Heavner SF, Castro JC, Sharma M, Tekin A, Bogojevic M, Zec S, Sharma N, Cartin-Ceba R, Nanchal RS, Sanghavi DK, La Nou AT, Khan SA, Belden KA, Chen JT, Melamed RR, Sayed IA, Reilkoff RA, Herasevich V, Domecq Garces JP, Walkey AJ, Boman K, Kumar VK, Kashyap R. Validation of automated data abstraction for SCCM discovery VIRUS COVID-19 registry: practical EHR export pathways (VIRUS-PEEP). Front Med (Lausanne) 2023; 10:1089087. [PMID: 37859860 PMCID: PMC10583598 DOI: 10.3389/fmed.2023.1089087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 09/14/2023] [Indexed: 10/21/2023] Open
Abstract
Background The gold standard for gathering data from electronic health records (EHR) has been manual data extraction; however, this requires vast resources and personnel. Automation of this process reduces resource burdens and expands research opportunities. Objective This study aimed to determine the feasibility and reliability of automated data extraction in a large registry of adult COVID-19 patients. Materials and methods This observational study included data from sites participating in the SCCM Discovery VIRUS COVID-19 registry. Important demographic, comorbidity, and outcome variables were chosen for manual and automated extraction for the feasibility dataset. We quantified the degree of agreement with Cohen's kappa statistics for categorical variables. The sensitivity and specificity were also assessed. Correlations for continuous variables were assessed with Pearson's correlation coefficient and Bland-Altman plots. The strength of agreement was defined as almost perfect (0.81-1.00), substantial (0.61-0.80), and moderate (0.41-0.60) based on kappa statistics. Pearson correlations were classified as trivial (0.00-0.30), low (0.30-0.50), moderate (0.50-0.70), high (0.70-0.90), and extremely high (0.90-1.00). Measurements and main results The cohort included 652 patients from 11 sites. The agreement between manual and automated extraction for categorical variables was almost perfect in 13 (72.2%) variables (Race, Ethnicity, Sex, Coronary Artery Disease, Hypertension, Congestive Heart Failure, Asthma, Diabetes Mellitus, ICU admission rate, IMV rate, HFNC rate, ICU and Hospital Discharge Status), and substantial in five (27.8%) (COPD, CKD, Dyslipidemia/Hyperlipidemia, NIMV, and ECMO rate). The correlations were extremely high in three (42.9%) variables (age, weight, and hospital LOS) and high in four (57.1%) of the continuous variables (Height, Days to ICU admission, ICU LOS, and IMV days). The average sensitivity and specificity for the categorical data were 90.7 and 96.9%. Conclusion and relevance Our study confirms the feasibility and validity of an automated process to gather data from the EHR.
Collapse
Affiliation(s)
- Diana J. Valencia Morales
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Vikas Bansal
- Division of Nephrology and Critical Care Medicine, Department of Internal Medicine, Mayo Clinic, Rochester, MN, United States
| | - Smith F. Heavner
- CURE Drug Repurposing Collaboratory, Critical Path Institute, Tucson, AZ, United States
| | - Janna C. Castro
- Department of Information Technology, Mayo Clinic, Scottsdale, AZ, United States
| | - Mayank Sharma
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Aysun Tekin
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Marija Bogojevic
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Simon Zec
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Nikhil Sharma
- Division of Nephrology and Critical Care Medicine, Department of Internal Medicine, Mayo Clinic, Rochester, MN, United States
| | - Rodrigo Cartin-Ceba
- Division of Critical Care Medicine, Department of Pulmonary Medicine, Mayo Clinic, Scottsdale, AZ, United States
| | - Rahul S. Nanchal
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Devang K. Sanghavi
- Department of Critical Care Medicine, Mayo Clinic Florida, Jacksonville, FL, United States
| | - Abigail T. La Nou
- Department of Critical Care Medicine, Mayo Clinic Health System, Eau Claire, WI, United States
| | - Syed A. Khan
- Department of Critical Care Medicine, Mayo Clinic Health System, Mankato, MN, United States
| | - Katherine A. Belden
- Division of Infectious Diseases, Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, PA, United States
| | - Jen-Ting Chen
- Division of Critical Care Medicine, Department of Internal Medicine, Montefiore Medical Center, Albert Einstein College of Medicine, Bronx, NY, United States
| | - Roman R. Melamed
- Department of Critical Care Medicine, Abbott Northwestern Hospital, Allina Health, Minneapolis, MN, United States
| | - Imran A. Sayed
- Department of Pediatrics, Children’s Hospital of Colorado, University of Colorado Anschutz Medical Campus, Colorado Springs, CO, United States
| | - Ronald A. Reilkoff
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Internal Medicine, University of Minnesota Medical School, Edina, MN, United States
| | - Vitaly Herasevich
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Juan Pablo Domecq Garces
- Division of Nephrology and Critical Care Medicine, Department of Internal Medicine, Mayo Clinic, Rochester, MN, United States
| | - Allan J. Walkey
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Medicine, Evans Center of Implementation and Improvement Sciences, Boston University School of Medicine, Boston, MA, United States
| | - Karen Boman
- Society of Critical Care Medicine, Mount Prospect, IL, United States
| | - Vishakha K. Kumar
- Society of Critical Care Medicine, Mount Prospect, IL, United States
| | - Rahul Kashyap
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
5
|
Oronsky B, Burbano E, Stirn M, Brechlin J, Abrouk N, Caroen S, Coyle A, Williams J, Cabrales P, Reid TR. Data Management 101 for drug developers: A peek behind the curtain. Clin Transl Sci 2023; 16:1497-1509. [PMID: 37382299 PMCID: PMC10499417 DOI: 10.1111/cts.13582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 06/11/2023] [Accepted: 06/20/2023] [Indexed: 06/30/2023] Open
Abstract
In drug development a frequently used phrase is "data-driven". Just as high-test gas fuels a car, so drug development "runs on" high-quality data; hence, good data management practices, which involve case report form design, data entry, data capture, data validation, medical coding, database closure, and database locking, are critically important. This review covers the essentials of clinical data management (CDM) for the United States. It is intended to demystify CDM, which means nothing more esoteric than the collection, organization, maintenance, and analysis of data for clinical trials. The review is written with those who are new to drug development in mind and assumes only a passing familiarity with the terms and concepts that are introduced. However, its relevance may also extend to experienced professionals that feel the need to brush up on the basics. For added color and context, the review includes real-world examples with RRx-001, a new molecular entity in phase III and with fast-track status in head and neck cancer, and AdAPT-001, an oncolytic adenovirus armed with a transforming growth factor-beta (TGF-β) trap in a phase I/II clinical trial with which the authors, as employees of the biopharmaceutical company, EpicentRx, are closely involved. An alphabetized glossary of key terms and acronyms used throughout this review is also included for easy reference.
Collapse
Affiliation(s)
| | | | | | | | - Nacer Abrouk
- Clinical Trial InnovationsMountain ViewCaliforniaUSA
| | | | | | | | | | | |
Collapse
|
6
|
Federated electronic data capture (fEDC): Architecture and prototype. J Biomed Inform 2023; 138:104280. [PMID: 36623781 DOI: 10.1016/j.jbi.2023.104280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 12/23/2022] [Accepted: 01/03/2023] [Indexed: 01/09/2023]
Abstract
In clinical research as well as patient care, structured documentation of findings is an important task. In many cases, this is achieved by means of electronic case report forms (eCRF) using corresponding information technology systems. To avoid double data entry, eCRF systems can be integrated with electronic health records (EHR). However, when researchers from different institutions collaborate in collecting data, they often use a single joint eCRF system on the Internet. In this case, integration with EHR systems is not possible in most cases due to information security and data protection restrictions. To overcome this shortcoming, we propose a novel architecture for a federated electronic data capture system (fEDC). Four key requirements were identified for fEDC: Definitions of forms have to be available in a reliable and controlled fashion, integration with electronic health record systems must be possible, patient data should be under full local control until they are explicitly transferred for joint analysis, and the system must support data sharing principles accepted by the scientific community for both data model and data captured. With our approach, sites participating in a joint study can run their own instance of an fEDC system that complies with local standards (such as being behind a network firewall) while also being able to benefit from using identical form definitions by sharing metadata in the Operational Data Model (ODM) format published by the Clinical Data Interchange Standards Consortium (CDISC) throughout the collaboration. The fEDC architecture was validated with a working open-source prototype at five German university hospitals. The fEDC architecture provides a novel approach with the potential to significantly improve collaborative data capture: Efforts for data entry are reduced and at the same time, data quality is increased since barriers for integrating with local electronic health record systems are lowered. Further, metadata are shared and patient privacy is ensured at a high level.
Collapse
|
7
|
Ebbers T, Takes RP, Honings J, Smeele LE, Kool RB, van den Broek GB. Development and validation of automated electronic health record data reuse for a multidisciplinary quality dashboard. Digit Health 2023; 9:20552076231191007. [PMID: 37529541 PMCID: PMC10388626 DOI: 10.1177/20552076231191007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 07/13/2023] [Indexed: 08/03/2023] Open
Abstract
Objective To describe the development and validation of automated electronic health record data reuse for a multidisciplinary quality dashboard. Materials and methods Comparative study analyzing a manually extracted and an automatically extracted dataset with 262 patients treated for HNC cancer in a tertiary oncology center in the Netherlands in 2020. The primary outcome measures were the percentage of agreement on data elements required for calculating quality indicators and the difference between indicators results calculated using manually collected and indicators that used automatically extracted data. Results The results of this study demonstrate high agreement between manual and automatically collected variables, reaching up to 99.0% agreement. However, some variables demonstrate lower levels of agreement, with one variable showing only a 20.0% agreement rate. The indicator results obtained through manual collection and automatic extraction show high agreement in most cases, with discrepancy rates ranging from 0.3% to 3.5%. One indicator is identified as a negative outlier, with a discrepancy rate of nearly 25%. Conclusions This study shows that it is possible to use routinely collected structured data to reliably measure the quality of care in real-time, which could render manual data collection for quality measurement obsolete. To achieve reliable data reuse, it is important that relevant data is recorded as structured data during the care process. Furthermore, the results also imply that data validation is conditional to development of a reliable dashboard.
Collapse
Affiliation(s)
- Tom Ebbers
- Department of Otorhinolaryngology and Head and Neck Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Robert P Takes
- Department of Otorhinolaryngology and Head and Neck Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Jimmie Honings
- Department of Otorhinolaryngology and Head and Neck Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Ludi E Smeele
- Department of Head and Neck Oncology and Surgery, Antoni van Leeuwenhoek, Amsterdam, The Netherlands
| | - Rudolf B Kool
- Radboud Institute for Health Sciences, IQ Healthcare, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - Guido B van den Broek
- Department of Otorhinolaryngology and Head and Neck Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
8
|
Chauhan RS, Pradhan A, Munshi A, Mohanti BK. Efficient and Reliable Data Extraction in Radiation Oncology using Python Programming Language: A Pilot Study. J Med Phys 2023; 48:13-18. [PMID: 37342597 PMCID: PMC10277304 DOI: 10.4103/jmp.jmp_12_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 02/27/2023] [Accepted: 03/02/2023] [Indexed: 06/23/2023] Open
Abstract
Background and Purpose In recent years, data science approaches have entered health-care systems such as radiology, pathology, and radiation oncology. In our pilot study, we developed an automated data mining approach to extract data from a treatment planning system (TPS) with high speed, maximum accuracy, and little human interaction. We compared the amount of time required for manual data extraction versus the automated data mining technique. Materials and Methods A Python programming script was created to extract specified parameters and features pertaining to patients and treatment (a total of 25 features) from TPS. We successfully implemented automation in data mining, utilizing the application programming interface environment provided by the external beam radiation therapy equipment provider for the whole group of patients who were accepted for treatment. Results This in-house Python-based script extracted selected features for 427 patients in 0.28 ± 0.03 min with 100% accuracy at an astonishing rate of 0.04 s/plan. Comparatively, manual extraction of 25 parameters took an average of 4.5 ± 0.33 min/plan, along with associated transcriptional and transpositional errors and missing data information. This new approach turned out to be 6850 times faster than the conventional approach. Manual feature extraction time increased by a factor of nearly 2.5 if we doubled the number of features extracted, whereas for the Python script, it increased by a factor of just 1.15. Conclusion We conclude that our in-house developed Python script can extract plan data from TPS at a far higher speed (>6000 times) and with the best possible accuracy compared to manual data extraction.
Collapse
Affiliation(s)
- Rohit Singh Chauhan
- Department of Physics, GLA University, Mathura, Uttar Pradesh, India
- Department of Radiation Oncology, Manipal Hospitals, Dwarka, New Delhi, India
| | - Anirudh Pradhan
- Centre for Cosmology, Astrophysics and Space Science, GLA University, Mathura, Uttar Pradesh, India
| | - Anusheel Munshi
- Department of Radiation Oncology, Manipal Hospitals, Dwarka, New Delhi, India
| | - Bidhu Kalyan Mohanti
- KIMS Cancer Centre, Kalinga Institute of Medical Sciences, KIIT University, Bhubaneswar, Odisha, India
| |
Collapse
|
9
|
Klappe ES, Cornet R, Dongelmans DA, de Keizer NF. Inaccurate recording of routinely collected data items influences identification of COVID-19 patients. Int J Med Inform 2022; 165:104808. [PMID: 35767912 PMCID: PMC9186787 DOI: 10.1016/j.ijmedinf.2022.104808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 04/11/2022] [Accepted: 06/03/2022] [Indexed: 11/20/2022]
Abstract
Background During the Coronavirus disease 2019 (COVID-19) pandemic it became apparent that it is difficult to extract standardized Electronic Health Record (EHR) data for secondary purposes like public health decision-making. Accurate recording of, for example, standardized diagnosis codes and test results is required to identify all COVID-19 patients. This study aimed to investigate if specific combinations of routinely collected data items for COVID-19 can be used to identify an accurate set of intensive care unit (ICU)-admitted COVID-19 patients. Methods The following routinely collected EHR data items to identify COVID-19 patients were evaluated: positive reverse transcription polymerase chain reaction (RT-PCR) test results; problem list codes for COVID-19 registered by healthcare professionals and COVID-19 infection labels. COVID-19 codes registered by clinical coders retrospectively after discharge were also evaluated. A gold standard dataset was created by evaluating two datasets of suspected and confirmed COVID-19-patients admitted to the ICU at a Dutch university hospital between February 2020 and December 2020, of which one set was manually maintained by intensivists and one set was extracted from the EHR by a research data management department. Patients were labeled ‘COVID-19′ if their EHR record showed diagnosing COVID-19 during or right before an ICU-admission. Patients were labeled ‘non-COVID-19′ if the record indicated no COVID-19, exclusion or only suspicion during or right before an ICU-admission or if COVID-19 was diagnosed and cured during non-ICU episodes of the hospitalization in which an ICU-admission took place. Performance was determined for 37 queries including real-time and retrospective data items. We used the F1 score, which is the harmonic mean between precision and recall. The gold standard dataset was split into one subset including admissions between February and April and one subset including admissions between May and December to determine accuracy differences. Results The total dataset consisted of 402 patients: 196 ‘COVID-19′ and 206 ‘non-COVID-19′ patients. F1 scores of search queries including EHR data items that can be extracted real-time ranged between 0.68 and 0.97 and for search queries including the data item that was retrospectively registered by clinical coders F1 scores ranged between 0.73 and 0.99. F1 scores showed no clear pattern in variability between the two time periods. Conclusions Our study showed that one cannot rely on individual routinely collected data items such as coded COVID-19 on problem lists to identify all COVID-19 patients. If information is not required real-time, medical coding from clinical coders is most reliable. Researchers should be transparent about their methods used to extract data. To maximize the ability to completely identify all COVID-19 cases alerts for inconsistent data and policies for standardized data capture could enable reliable data reuse.
Collapse
Affiliation(s)
- Eva S Klappe
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam, Netherlands.
| | - Ronald Cornet
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam, Netherlands
| | - Dave A Dongelmans
- Amsterdam UMC, University of Amsterdam, Department of Intensive Care Medicine, Amsterdam, Netherlands
| | - Nicolette F de Keizer
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam, Netherlands
| |
Collapse
|
10
|
Research on primary Sjögren's syndrome in 2004-2021: a Web of Science-based cross-sectional bibliometric analysis. Rheumatol Int 2022; 42:2221-2229. [PMID: 35536378 DOI: 10.1007/s00296-022-05138-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 04/18/2022] [Indexed: 10/18/2022]
Abstract
The extent, range, and nature of available research in the field of primary Sjögren's syndrome (pSS) have not been understood fully. This study aimed to map the literature available on pSS, and identify global hotspots and trends in the research. Papers on pSS published between 2004 and 2021 were searched from Web of Science Core Collection. The quantity and citations of publications, and the research hotspots and trends in the field of pSS were analyzed and presented visually by Microsoft Excel and Citespace software. A total of 3606 papers mainly from 526 institutions in 83 countries/regions were included for analysis. The number of publications presented an overall upward trend in the field of pSS from 2004 to 2021. The USA ranked first in the number of publications (n = 661), followed by China (n = 491), Italy (n = 405), France (n = 351), and Japan (n = 292). Moreover, seven of the top ten countries by the number of publications on pSS were from Europe. The University of Groningen (n = 661), Xavier Mariette (n = 95), and Clinical and Experimental Rheumatology (n = 184) were the most prolific affiliation, author, and journal, respectively. Vitali C (n = 2009) and Arthritis and Rheumatism (n = 3918) held the record for the most cited papers by an author and journal, respectively. At present, the hot keywords in the field of pSS include disease activity, ultrasonography, management, consensus, and data-driven. Lymphoid organization, clinical phenotypes outcome, salivary gland ultrasonography, and Toll-like receptor are the emerging research trends in pSS. Research on pSS is flourishing. Current research of pSS mainly focuses on disease activity, ultrasonography, and management. While, the emerging research trends in pSS are lymphoid organization, clinical phenotypes outcome, salivary gland ultrasonography, and Toll-like receptor.
Collapse
|
11
|
Čolaković A, Avdagić-Golub E, Begović M, Memić B, Hasković-Džubur A. Application of machine learning in the fight against the COVID-19 pandemic: A review. ACTA FACULTATIS MEDICAE NAISSENSIS 2022. [DOI: 10.5937/afmnai39-38354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Introduction: Machine learning (ML) plays a significant role in the fight against the COVID-19 (officially known as SARS-CoV-2) pandemic. ML techniques enable the rapid detection of patterns and trends in large datasets. Therefore, ML provides efficient methods to generate knowledge from structured and unstructured data. This potential is particularly significant when the pandemic affects all aspects of human life. It is necessary to collect a large amount of data to identify methods to prevent the spread of infection, early detection, reduction of consequences, and finding appropriate medicine. Modern information and communication technologies (ICT) such as the Internet of Things (IoT) allow the collection of large amounts of data from various sources. Thus, we can create predictive ML-based models for assessments, predictions, and decisions. Methods: This is a review article based on previous studies and scientifically proven knowledge. In this paper, bibliometric data from authoritative databases of research publications (Web of Science, Scopus, PubMed) are combined for bibliometric analyses in the context of ML applications for COVID-19. Aim: This paper reviews some ML-based applications used for mitigating COVID-19. We aimed to identify and review ML potentials and solutions for mitigating the COVID-19 pandemic as well as to present some of the most commonly used ML techniques, algorithms, and datasets applied in the context of COVID-19. Also, we provided some insights into specific emerging ideas and open issues to facilitate future research. Conclusion: ML is an effective tool for diagnosing and early detection of symptoms, predicting the spread of a pandemic, developing medicines and vaccines, etc.
Collapse
|