1
|
Epizitone A, Moyane SP, Agbehadji IE. A Data-Driven Paradigm for a Resilient and Sustainable Integrated Health Information Systems for Health Care Applications. J Multidiscip Healthc 2023; 16:4015-4025. [PMID: 38107085 PMCID: PMC10725635 DOI: 10.2147/jmdh.s433299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 11/02/2023] [Indexed: 12/19/2023] Open
Abstract
Introduction Many transformations and uncertainties, such as the fourth industrial revolution and pandemics, have propelled healthcare acceptance and deployment of health information systems (HIS). External and internal determinants aligning with the global course influence their deployments. At the epic is digitalization, which generates endless data that has permeated healthcare. The continuous proliferation of complex and dynamic healthcare data is the digitalization frontier in healthcare that necessitates attention. Objective This study explores the existing body of information on HIS for healthcare through the data lens to present a data-driven paradigm for healthcare augmentation paramount to attaining a sustainable and resilient HIS. Method Preferred Reporting Items for Systematic Reviews and Meta-Analyses: PRISMA-compliant in-depth literature review was conducted systematically to synthesize and analyze the literature content to ascertain the value disposition of HIS data in healthcare delivery. Results This study details the aspects of a data-driven paradigm for robust and sustainable HIS for health care applications. Data source, data action and decisions, data sciences techniques, serialization of data sciences techniques in the HIS, and data insight implementation and application are data-driven features expounded. These are essential data-driven paradigm building blocks that need iteration to succeed. Discussions Existing literature considers insurgent data in healthcare challenging, disruptive, and potentially revolutionary. This view echoes the current healthcare quandary of good and bad data availability. Thus, data-driven insights are essential for building a resilient and sustainable HIS. People, technology, and tasks dominated prior HIS frameworks, with few data-centric facets. Improving healthcare and the HIS requires identifying and integrating crucial data elements. Conclusion The paper presented a data-driven paradigm for a resilient and sustainable HIS. The findings show that data-driven track and components are essential to improve healthcare using data analytics insights. It provides an integrated footing for data analytics to support and effectively assist health care delivery.
Collapse
Affiliation(s)
- Ayogeboh Epizitone
- ICT and Society Research Group, Department of Information and Corporate Management, Durban University of Technology, Durban, South Africa
| | - Smangele Pretty Moyane
- Department of Information and Corporate Management, Durban University of Technology, Durban, South Africa
| | - Israel Edem Agbehadji
- Centre for Transformative Agricultural and Food Systems, School of Agricultural, Earth and Environmental Sciences, University of KwaZulu-Natal, Pietermaritzburg, South Africa
| |
Collapse
|
2
|
Dong T, Sunderland N, Nightingale A, Fudulu DP, Chan J, Zhai B, Freitas A, Caputo M, Dimagli A, Mires S, Wyatt M, Benedetto U, Angelini GD. Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database. Bioengineering (Basel) 2023; 10:1307. [PMID: 38002431 PMCID: PMC10669818 DOI: 10.3390/bioengineering10111307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/03/2023] [Accepted: 11/09/2023] [Indexed: 11/26/2023] Open
Abstract
BACKGROUND Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. OBJECTIVES To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. METHODS 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. RESULTS Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p < 0.05) alongside high R2 values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, p < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. CONCLUSIONS The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.
Collapse
Affiliation(s)
- Tim Dong
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Nicholas Sunderland
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Angus Nightingale
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Daniel P. Fudulu
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Jeremy Chan
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Ben Zhai
- School of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UK
| | - Alberto Freitas
- Faculty of Medicine, University of Porto, 4100 Porto, Portugal;
| | - Massimo Caputo
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Arnaldo Dimagli
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Stuart Mires
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Mike Wyatt
- University Hospitals Bristol and Weston, Marlborough St, Bristol BS1 3NU, UK;
| | - Umberto Benedetto
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Gianni D. Angelini
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| |
Collapse
|
3
|
Zhou S, Wang N, Wang L, Sun J, Blaes A, Liu H, Zhang R. A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records. Comput Struct Biotechnol J 2023; 22:32-40. [PMID: 37680211 PMCID: PMC10480628 DOI: 10.1016/j.csbj.2023.08.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/15/2023] [Accepted: 08/21/2023] [Indexed: 09/09/2023] Open
Abstract
Objective Transformer-based language models are prevailing in the clinical domain due to their excellent performance on clinical NLP tasks. The generalizability of those models is usually ignored during the model development process. This study evaluated the generalizability of CancerBERT, a Transformer-based clinical NLP model, along with classic machine learning models, i.e., conditional random field (CRF), bi-directional long short-term memory CRF (BiLSTM-CRF), across different clinical institutes through a breast cancer phenotype extraction task. Materials and methods Two clinical corpora of breast cancer patients were collected from the electronic health records from the University of Minnesota (UMN) and Mayo Clinic (MC), and annotated following the same guideline. We developed three types of NLP models (i.e., CRF, BiLSTM-CRF and CancerBERT) to extract cancer phenotypes from clinical texts. We evaluated the generalizability of models on different test sets with different learning strategies (model transfer vs locally trained). The entity coverage score was assessed with their association with the model performances. Results We manually annotated 200 and 161 clinical documents at UMN and MC. The corpora of the two institutes were found to have higher similarity between the target entities than the overall corpora. The CancerBERT models obtained the best performances among the independent test sets from two clinical institutes and the permutation test set. The CancerBERT model developed in one institute and further fine-tuned in another institute achieved reasonable performance compared to the model developed on local data (micro-F1: 0.925 vs 0.932). Conclusions The results indicate the CancerBERT model has superior learning ability and generalizability among the three types of clinical NLP models for our named entity recognition task. It has the advantage to recognize complex entities, e.g., entities with different labels.
Collapse
Affiliation(s)
- Sicheng Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Nan Wang
- School of Statistics, University of Minnesota, Minneapolis, MN, USA
| | - Liwei Wang
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Ju Sun
- Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Anne Blaes
- Department of Medicine, University of Minnesota, Minneapolis, MN, USA
| | - Hongfang Liu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
4
|
Li C, Xu W, Cohen T, Michalowski M, Pakhomov S. TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2023; 2023:360-369. [PMID: 37350929 PMCID: PMC10283131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/24/2023]
Abstract
The evidence is growing that machine and deep learning methods can learn the subtle differences between the language produced by people with various forms of cognitive impairment such as dementia and cognitively healthy individuals. Valuable public data repositories such as TalkBank have made it possible for researchers in the computational community to join forces and learn from each other to make significant advances in this area. However, due to variability in approaches and data selection strategies used by various researchers, results obtained by different groups have been difficult to compare directly. In this paper, we present TRESTLE (Toolkit for Reproducible Execution of Speech Text and Language Experiments), an open source platform that focuses on two datasets from the TalkBank repository with dementia detection as an illustrative domain. Successfully deployed in the hackallenge (Hackathon/Challenge) of the International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a precise digital blueprint of the data pre-processing and selection strategies that can be reused via TRESTLE by other researchers seeking comparable results with their peers and current state-of-the-art (SOTA) approaches.
Collapse
Affiliation(s)
- Changye Li
- Institute of Health Informatics, University of Minnesota, Minneapolis, MN
| | - Weizhe Xu
- Biomedical and Health Informatics, University of Washington, Seattle, Washington
| | - Trevor Cohen
- Biomedical and Health Informatics, University of Washington, Seattle, Washington
| | | | | |
Collapse
|
5
|
Keloth VK, Banda JM, Gurley M, Heider PM, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves RM, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei WQ, Williams AE, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. J Biomed Inform 2023; 142:104343. [PMID: 36935011 PMCID: PMC10428170 DOI: 10.1016/j.jbi.2023.104343] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 01/21/2023] [Accepted: 03/13/2023] [Indexed: 03/19/2023]
Abstract
Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies.
Collapse
Affiliation(s)
- Vipina K Keloth
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Michael Gurley
- Lurie Cancer Center, Northwestern University, Chicago, Illinois, USA
| | - Paul M Heider
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
| | - Georgina Kennedy
- Ingham Institute for Applied Medical Research, Sydney, Australia
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Timothy Miller
- Computational Health Informatics Program, Boston Children's Hospital, and Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Olga V Patterson
- VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Verily Life Sciences, Mountain View, CA, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Kalpana Raja
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Ruth M Reeves
- TN Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Masoud Rouhizadeh
- Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA; Biomedical Informatics and Data Science, Johns Hopkins University, Baltimore, MD, USA
| | - Jianlin Shi
- VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Department of Biomedical Informatics, University of Utah, Salt Lake City, USA
| | - Xiaoyan Wang
- Sema4 Mount Sinai Genomics Incorporation, Stamford, CT, USA
| | - Yanshan Wang
- Department of Health Information Management, Department of Biomedical Informatics, and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Rui Zhang
- Institute for Health Informatics, and Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, USA
| | | | | | - Clair Blacketer
- Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Patrick Ryan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA; Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
| |
Collapse
|
6
|
Yew ANJ, Schraagen M, Otte WM, van Diessen E. Transforming epilepsy research: A systematic review on natural language processing applications. Epilepsia 2023; 64:292-305. [PMID: 36462150 PMCID: PMC10108221 DOI: 10.1111/epi.17474] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/23/2022] [Accepted: 12/01/2022] [Indexed: 12/05/2022]
Abstract
Despite improved ancillary investigations in epilepsy care, patients' narratives remain indispensable for diagnosing and treatment monitoring. This wealth of information is typically stored in electronic health records and accumulated in medical journals in an unstructured manner, thereby restricting complete utilization in clinical decision-making. To this end, clinical researchers increasing apply natural language processing (NLP)-a branch of artificial intelligence-as it removes ambiguity, derives context, and imbues standardized meaning from free-narrative clinical texts. This systematic review presents an overview of the current NLP applications in epilepsy and discusses the opportunities and drawbacks of NLP alongside its future implications. We searched the PubMed and Embase databases with a "natural language processing" and "epilepsy" query (March 4, 2022) and included original research articles describing the application of NLP techniques for textual analysis in epilepsy. Twenty-six studies were included. Fifty-eight percent of these studies used NLP to classify clinical records into predefined categories, improving patient identification and treatment decisions. Other applications of NLP had structured clinical information retrieval from electronic health records, scientific papers, and online posts of patients. Challenges and opportunities of NLP applications for enhancing epilepsy care and research are discussed. The field could further benefit from NLP by replicating successes in other health care domains, such as NLP-aided quality evaluation for clinical decision-making, outcome prediction, and clinical record summarization.
Collapse
Affiliation(s)
- Arister N J Yew
- University College Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Marijn Schraagen
- Department of Information and Computing Sciences, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Willem M Otte
- Department of Child Neurology, Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| | - Eric van Diessen
- Department of Child Neurology, Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
7
|
Yusufov M, Pirl WF, Braun I, Tulsky JA, Lindvall C. Natural Language Processing for Computer-Assisted Chart Review to Assess Documentation of Substance use and Psychopathology in Heart Failure Patients Awaiting Cardiac Resynchronization Therapy. J Pain Symptom Manage 2022; 64:400-409. [PMID: 35716959 DOI: 10.1016/j.jpainsymman.2022.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 06/09/2022] [Accepted: 06/09/2022] [Indexed: 11/19/2022]
Abstract
CONTEXT Advanced heart failure (HF) patients often experience distressing psychological symptoms, frequently meeting diagnostic criteria for psychological disorders, including anxiety, depression, and substance use disorder. Patients with device-based HF therapies have added risk for psychological disorders, with consequences for their physiological functioning, including adverse cardiac outcomes. OBJECTIVES This study used natural language processing (NLP) for computer-assisted chart review to assess documentation of mental health and substance use in HF patients awaiting cardiac resynchronization therapy (CRT), a device-based HF therapy. METHODS We applied NLP to clinical notes from electronic health records (EHR) of 965 consecutive patients, with 9821 total clinical notes, at two academic medical centers between 2004 and 2015. We developed and validated a keyword library capturing terms related to mental health and substance use, while balancing specificity and sensitivity. RESULTS Mean age was 71.6 years (SD = 11.8), 78% male, and 87% non-Hispanic White. Of the 544 patients (56.4%) with documentation of mental health history, 9.7% had their mental health assessed and 6.6% had a plan documented. Of the 773 patients (80.1%) with documentation of substance use history, 10 (1.0%) had an assessment, and 3 (0.3%) had a plan. CONCLUSION Despite clinical recommendations and standards of care, clinicians are under documenting assessments and plans prior to CRT. Future research should develop an algorithm to prompt clinicians to document this content. Such quality improvement efforts may ensure adherence to standards of care and clinical guidelines.
Collapse
Affiliation(s)
- Miryam Yusufov
- Department of Psychosocial Oncology and Palliative Care (M.Y., W.F.P., I.B., J.A.T., C.L.), Dana-Farber Cancer Institute, Boston, Massachusetts, USA; Harvard Medical School (M.Y., W.F.P., I.B., J.A.T., C.L.), Boston, Massachusetts, USA.
| | - William F Pirl
- Department of Psychosocial Oncology and Palliative Care (M.Y., W.F.P., I.B., J.A.T., C.L.), Dana-Farber Cancer Institute, Boston, Massachusetts, USA; Harvard Medical School (M.Y., W.F.P., I.B., J.A.T., C.L.), Boston, Massachusetts, USA
| | - Ilana Braun
- Department of Psychosocial Oncology and Palliative Care (M.Y., W.F.P., I.B., J.A.T., C.L.), Dana-Farber Cancer Institute, Boston, Massachusetts, USA; Harvard Medical School (M.Y., W.F.P., I.B., J.A.T., C.L.), Boston, Massachusetts, USA
| | - James A Tulsky
- Department of Psychosocial Oncology and Palliative Care (M.Y., W.F.P., I.B., J.A.T., C.L.), Dana-Farber Cancer Institute, Boston, Massachusetts, USA; Harvard Medical School (M.Y., W.F.P., I.B., J.A.T., C.L.), Boston, Massachusetts, USA
| | - Charlotta Lindvall
- Department of Psychosocial Oncology and Palliative Care (M.Y., W.F.P., I.B., J.A.T., C.L.), Dana-Farber Cancer Institute, Boston, Massachusetts, USA; Harvard Medical School (M.Y., W.F.P., I.B., J.A.T., C.L.), Boston, Massachusetts, USA
| |
Collapse
|
8
|
Hossain MZ, Daskalaki E, Brüstle A, Desborough J, Lueck CJ, Suominen H. The role of machine learning in developing non-magnetic resonance imaging based biomarkers for multiple sclerosis: a systematic review. BMC Med Inform Decis Mak 2022; 22:242. [PMID: 36109726 PMCID: PMC9476596 DOI: 10.1186/s12911-022-01985-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 09/02/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Multiple sclerosis (MS) is a neurological condition whose symptoms, severity, and progression over time vary enormously among individuals. Ideally, each person living with MS should be provided with an accurate prognosis at the time of diagnosis, precision in initial and subsequent treatment decisions, and improved timeliness in detecting the need to reassess treatment regimens. To manage these three components, discovering an accurate, objective measure of overall disease severity is essential. Machine learning (ML) algorithms can contribute to finding such a clinically useful biomarker of MS through their ability to search and analyze datasets about potential biomarkers at scale. Our aim was to conduct a systematic review to determine how, and in what way, ML has been applied to the study of MS biomarkers on data from sources other than magnetic resonance imaging. METHODS Systematic searches through eight databases were conducted for literature published in 2014-2020 on MS and specified ML algorithms. RESULTS Of the 1, 052 returned papers, 66 met the inclusion criteria. All included papers addressed developing classifiers for MS identification or measuring its progression, typically, using hold-out evaluation on subsets of fewer than 200 participants with MS. These classifiers focused on biomarkers of MS, ranging from those derived from omics and phenotypical data (34.5% clinical, 33.3% biological, 23.0% physiological, and 9.2% drug response). Algorithmic choices were dependent on both the amount of data available for supervised ML (91.5%; 49.2% classification and 42.3% regression) and the requirement to be able to justify the resulting decision-making principles in healthcare settings. Therefore, algorithms based on decision trees and support vector machines were commonly used, and the maximum average performance of 89.9% AUC was found in random forests comparing with other ML algorithms. CONCLUSIONS ML is applicable to determining how candidate biomarkers perform in the assessment of disease severity. However, applying ML research to develop decision aids to help clinicians optimize treatment strategies and analyze treatment responses in individual patients calls for creating appropriate data resources and shared experimental protocols. They should target proceeding from segregated classification of signals or natural language to both holistic analyses across data modalities and clinically-meaningful differentiation of disease.
Collapse
Affiliation(s)
- Md Zakir Hossain
- School of Computing, College of Engineering and Computer Science, Australian National University, Canberra, ACT Australia
| | - Elena Daskalaki
- School of Computing, College of Engineering and Computer Science, Australian National University, Canberra, ACT Australia
| | - Anne Brüstle
- The John Curtin School of Medical Research, College of Health and Medicine, Australian National University, Canberra, ACT Australia
| | - Jane Desborough
- Department of Health Services Research and Policy, Research School of Population Health, College of Health and Medicine, Australian National University, Canberra, ACT Australia
| | - Christian J. Lueck
- Department of Neurology, Canberra Hospital, Canberra, ACT Australia
- ANU Medical School, College of Health and Medicine, Australian National University, Canberra, ACT Australia
| | - Hanna Suominen
- School of Computing, College of Engineering and Computer Science, Australian National University, Canberra, ACT Australia
- Department of Computing, University of Turku, Turku, Finland
| |
Collapse
|
9
|
Fu S, Wen A, Pagali S, Zong N, St Sauver J, Sohn S, Fan J, Liu H. The Implication of Latent Information Quality to the Reproducibility of Secondary Use of Electronic Health Records. Stud Health Technol Inform 2022; 290:173-177. [PMID: 35672994 PMCID: PMC9754076 DOI: 10.3233/shti220055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Reproducibility is an important quality criterion for the secondary use of electronic health records (EHRs). However, multiple barriers to reproducibility are embedded in the heterogeneous EHR environment. These barriers include complex processes for collecting and organizing EHR data and dynamic multi-level interactions occurring during information use (e.g., inter-personal, inter-system, and cross-institutional). To ensure reproducible use of EHRs, we investigated four information quality dimensions and examine the implications for reproducibility based on a real-world EHR study. Four types of IQ measurements suggested that barriers to reproducibility occurred for all stages of secondary use of EHR data. We discussed our recommendations and emphasized the importance of promoting transparent, high-throughput, and accessible data infrastructures and implementation best practices (e.g., data quality assessment, reporting standard).
Collapse
Affiliation(s)
- Sunyang Fu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, Minnesota, USA
- University of Minnesota – Twin Cities, Minneapolis, Minnesota, USA
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Sandeep Pagali
- Department of Medicine, Division of Hospital Internal Medicine, Mayo Clinic, Rochester, MN, USA
| | - Nansu Zong
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Jennifer St Sauver
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | - Sunghwan Sohn
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Jungwei Fan
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, Minnesota, USA
- University of Minnesota – Twin Cities, Minneapolis, Minnesota, USA
| |
Collapse
|
10
|
Li Z. Construction Of Marketing Curriculum System Based on Blending Learning "3+2" Joint Training of Higher Vocational and Undergraduate Education Using NLP For Marketing Document Management and Information Retravel. ACM T ASIAN LOW-RESO 2022. [DOI: 10.1145/3524113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The blended learning system provides educational tools to a student in accordance with the student's expressed educational interests, and it is a mix of online training and assignments, giving them more control over the learning and other developmental facets. which it is has a profound impact on current higher education. This paper analyzes the "3+2" segmented training mode of higher vocational education and undergraduate education, discusses the problems existing in the curriculum system of marketing major in higher vocational education under this training mode and the transformative potential of blended learning in the context of the challenges facing higher education, with the perspective of Natural language Processing Assistance on Digital library management, and in NLP the human language is divided into segments, so that the grammatical structure and the actual meaning of the words can be analyzed and understood that puts forward corresponding measures to promote the development of the integration of segmented training courses of higher vocational education and undergraduate education by smart technologies and improve the quality of marketing talents jointly trained under higher vocational education and undergraduate education. The observational results predict the research of some scholars and, this paper puts forward the opinions of curriculum construction, and designs the integrated curriculum system diagram, to have a certain reference for the "3+2" joint training mode with the compounding of the Natural language processing Assistance on Digital Management attains an effective and accurate outcome with the construction of marketing curriculum system based on blending learning.
Collapse
Affiliation(s)
- Zhiqin Li
- Anhui Technical College of Industry and Economy, Hefei, China
| |
Collapse
|
11
|
Eyre H, Chapman AB, Peterson KS, Shi J, Alba PR, Jones MM, Box TL, DuVall SL, Patterson OV. Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2021:438-447. [PMID: 35308962 PMCID: PMC8861690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Despite impressive success of machine learning algorithms in clinical natural language processing (cNLP), rule-based approaches still have a prominent role. In this paper, we introduce medspaCy, an extensible, open-source cNLP library based on spaCy framework that allows flexible integration of rule-based and machine learning-based algorithms adapted to clinical text. MedspaCy includes a variety of components that meet common cNLP needs such as context analysis and mapping to standard terminologies. By utilizing spaCy's clear and easy-to-use conventions, medspaCy enables development of custom pipelines that integrate easily with other spaCy-based modules. Our toolkit includes several core components and facilitates rapid development of pipelines for clinical text.
Collapse
Affiliation(s)
- Hannah Eyre
- VA Salt Lake City Health Care System
- University of Utah, Salt Lake City, UT, USA
| | - Alec B Chapman
- VA Salt Lake City Health Care System
- University of Utah, Salt Lake City, UT, USA
| | - Kelly S Peterson
- University of Utah, Salt Lake City, UT, USA
- Veterans Health Administration Office of Analytics and Performance Integration
| | | | - Patrick R Alba
- VA Salt Lake City Health Care System
- University of Utah, Salt Lake City, UT, USA
| | - Makoto M Jones
- VA Salt Lake City Health Care System
- University of Utah, Salt Lake City, UT, USA
| | - Tamára L Box
- Veterans Health Administration Office of Analytics and Performance Integration
| | - Scott L DuVall
- VA Salt Lake City Health Care System
- University of Utah, Salt Lake City, UT, USA
| | - Olga V Patterson
- VA Salt Lake City Health Care System
- University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
12
|
Turewicz M, Frericks-Zipper A, Stepath M, Schork K, Ramesh S, Marcus K, Eisenacher M. BIONDA: a free database for a fast information on published biomarkers. BIOINFORMATICS ADVANCES 2021; 1:vbab015. [PMID: 36700097 PMCID: PMC9710600 DOI: 10.1093/bioadv/vbab015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 07/11/2021] [Indexed: 01/28/2023]
Abstract
Summary Because of the steadily increasing and already manually unmanageable total number of biomarker-related articles in biomedical research, there is a need for intelligent systems that extract all relevant information from biomedical texts and provide it as structured information to researchers in a user-friendly way. To address this, BIONDA was implemented as a free text mining-based online database for molecular biomarkers including genes, proteins and miRNAs and for all kinds of diseases. The contained structured information on published biomarkers is extracted automatically from Europe PMC publication abstracts and high-quality sources like UniProt and Disease Ontology. This allows frequent content updates. Availability and implementation BIONDA is freely accessible via a user-friendly web application at http://bionda.mpc.ruhr-uni-bochum.de. The current BIONDA code is available at GitHub via https://github.com/mpc-bioinformatics/bionda. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Michael Turewicz
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Anika Frericks-Zipper
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Markus Stepath
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Karin Schork
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Spoorti Ramesh
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Katrin Marcus
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Martin Eisenacher
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| |
Collapse
|