1
|
Gaviria-Valencia S, Murphy SP, Kaggal VC, McBane Ii RD, Rooke TW, Chaudhry R, Alzate-Aguirre M, Arruda-Olson AM. Near Real-time Natural Language Processing for the Extraction of Abdominal Aortic Aneurysm Diagnoses From Radiology Reports: Algorithm Development and Validation Study. JMIR Med Inform 2023; 11:e40964. [PMID: 36826984 PMCID: PMC10007015 DOI: 10.2196/40964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 12/29/2022] [Accepted: 01/19/2023] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Management of abdominal aortic aneurysms (AAAs) requires serial imaging surveillance to evaluate the aneurysm dimension. Natural language processing (NLP) has been previously developed to retrospectively identify patients with AAA from electronic health records (EHRs). However, there are no reported studies that use NLP to identify patients with AAA in near real-time from radiology reports. OBJECTIVE This study aims to develop and validate a rule-based NLP algorithm for near real-time automatic extraction of AAA diagnosis from radiology reports for case identification. METHODS The AAA-NLP algorithm was developed and deployed to an EHR big data infrastructure for near real-time processing of radiology reports from May 1, 2019, to September 2020. NLP extracted named entities for AAA case identification and classified subjects as cases and controls. The reference standard to assess algorithm performance was a manual review of processed radiology reports by trained physicians following standardized criteria. Reviewers were blinded to the diagnosis of each subject. The AAA-NLP algorithm was refined in 3 successive iterations. For each iteration, the AAA-NLP algorithm was modified based on performance compared to the reference standard. RESULTS A total of 360 reports were reviewed, of which 120 radiology reports were randomly selected for each iteration. At each iteration, the AAA-NLP algorithm performance improved. The algorithm identified AAA cases in near real-time with high positive predictive value (0.98), sensitivity (0.95), specificity (0.98), F1 score (0.97), and accuracy (0.97). CONCLUSIONS Implementation of NLP for accurate identification of AAA cases from radiology reports with high performance in near real time is feasible. This NLP technique will support automated input for patient care and clinical decision support tools for the management of patients with AAA. .
Collapse
Affiliation(s)
- Simon Gaviria-Valencia
- Divisions of Preventive Cardiology and Cardiovascular Ultrasound, Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, United States
| | - Sean P Murphy
- Advanced Analytics Services Unit (Natural Language Processing), Department of Information Technology, Mayo Clinic, Rochester, MN, United States
| | - Vinod C Kaggal
- Enterprise Technology Services (Natural Language Processing), Department of Information Technology, Mayo Clinic, Rochester, MN, United States
| | - Robert D McBane Ii
- Gonda Vascular Center, Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, United States
| | - Thom W Rooke
- Gonda Vascular Center, Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, United States
| | - Rajeev Chaudhry
- Department of Internal Medicine, Mayo Clinic, Rochester, MN, United States
| | - Mateo Alzate-Aguirre
- Divisions of Preventive Cardiology and Cardiovascular Ultrasound, Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, United States
| | - Adelaide M Arruda-Olson
- Divisions of Preventive Cardiology and Cardiovascular Ultrasound, Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
2
|
Dewaswala N, Chen D, Bhopalwala H, Kaggal VC, Murphy SP, Bos JM, Geske JB, Gersh BJ, Ommen SR, Araoz PA, Ackerman MJ, Arruda-Olson AM. Natural language processing for identification of hypertrophic cardiomyopathy patients from cardiac magnetic resonance reports. BMC Med Inform Decis Mak 2022; 22:272. [PMID: 36258218 PMCID: PMC9580188 DOI: 10.1186/s12911-022-02017-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 10/10/2022] [Indexed: 11/30/2022] Open
Abstract
Background Cardiac magnetic resonance (CMR) imaging is important for diagnosis and risk stratification of hypertrophic cardiomyopathy (HCM) patients. However, collection of information from large numbers of CMR reports by manual review is time-consuming, error-prone and costly. Natural language processing (NLP) is an artificial intelligence method for automated extraction of information from narrative text including text in CMR reports in electronic health records (EHR). Our objective was to assess whether NLP can accurately extract diagnosis of HCM from CMR reports.
Methods An NLP system with two tiers was developed for information extraction from narrative text in CMR reports; the first tier extracted information regarding HCM diagnosis while the second extracted categorical and numeric concepts for HCM classification. We randomly allocated 200 HCM patients with CMR reports from 2004 to 2018 into training (100 patients with 185 CMR reports) and testing sets (100 patients with 206 reports). Results NLP algorithms demonstrated very high performance compared to manual annotation. The algorithm to extract HCM diagnosis had accuracy of 0.99. The accuracy for categorical concepts included HCM morphologic subtype 0.99, systolic anterior motion of the mitral valve 0.96, mitral regurgitation 0.93, left ventricular (LV) obstruction 0.94, location of obstruction 0.92, apical pouch 0.98, LV delayed enhancement 0.93, left atrial enlargement 0.99 and right atrial enlargement 0.98. Accuracy for numeric concepts included maximal LV wall thickness 0.96, LV mass 0.99, LV mass index 0.98, LV ejection fraction 0.98 and right ventricular ejection fraction 0.99. Conclusions NLP identified and classified HCM from CMR narrative text reports with very high performance.
Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-02017-y.
Collapse
Affiliation(s)
- Nakeya Dewaswala
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - David Chen
- Department of Cardiovascular Surgery, Cleveland Clinic, OH, Cleveland, USA
| | - Huzefa Bhopalwala
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - Vinod C Kaggal
- Enterprise Technology Services, Shared Service Offices, Mayo Clinic, MN, Rochester, USA
| | - Sean P Murphy
- Advanced Analytics Services, Mayo Clinic Rochester, Rochester, MN, USA
| | - J Martijn Bos
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - Jeffrey B Geske
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - Bernard J Gersh
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - Steve R Ommen
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - Philip A Araoz
- Department of Radiology, Mayo Clinic Rochester, Rochester, MN, USA
| | - Michael J Ackerman
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA.,Department of Pediatric and Adolescent Medicine, Mayo Clinic Rochester, Rochester, MN, USA.,Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic Rochester, Rochester, MN, USA
| | | |
Collapse
|
3
|
Partogi M, Gaviria-Valencia S, Alzate Aguirre M, Pick NJ, Bhopalwala HM, Barry BA, Kaggal VC, Scott CG, Kessler ME, Moore MM, Mitchell JD, Chaudhry R, Bonacci RP, Arruda-Olson AM. Sociotechnical Intervention for Improved Delivery of Preventive Cardiovascular Care to Rural Communities: Participatory Design Approach. J Med Internet Res 2022; 24:e27333. [PMID: 35994324 PMCID: PMC9446142 DOI: 10.2196/27333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 12/30/2021] [Accepted: 06/27/2022] [Indexed: 11/15/2022] Open
Abstract
Background Clinical practice guidelines recommend antiplatelet and statin therapies as well as blood pressure control and tobacco cessation for secondary prevention in patients with established atherosclerotic cardiovascular diseases (ASCVDs). However, these strategies for risk modification are underused, especially in rural communities. Moreover, resources to support the delivery of preventive care to rural patients are fewer than those for their urban counterparts. Transformative interventions for the delivery of tailored preventive cardiovascular care to rural patients are needed. Objective A multidisciplinary team developed a rural-specific, team-based model of care intervention assisted by clinical decision support (CDS) technology using participatory design in a sociotechnical conceptual framework. The model of care intervention included redesigned workflows and a novel CDS technology for the coordination and delivery of guideline recommendations by primary care teams in a rural clinic. Methods The design of the model of care intervention comprised 3 phases: problem identification, experimentation, and testing. Input from team members (n=35) required 150 hours, including observations of clinical encounters, provider workshops, and interviews with patients and health care professionals. The intervention was prototyped, iteratively refined, and tested with user feedback. In a 3-month pilot trial, 369 patients with ASCVDs were randomized into the control or intervention arm. Results New workflows and a novel CDS tool were created to identify patients with ASCVDs who had gaps in preventive care and assign the right care team member for delivery of tailored recommendations. During the pilot, the intervention prototype was iteratively refined and tested. The pilot demonstrated feasibility for successful implementation of the sociotechnical intervention as the proportion of patients who had encounters with advanced practice providers (nurse practitioners and physician assistants), pharmacists, or tobacco cessation coaches for the delivery of guideline recommendations in the intervention arm was greater than that in the control arm. Conclusions Participatory design and a sociotechnical conceptual framework enabled the development of a rural-specific, team-based model of care intervention assisted by CDS technology for the transformation of preventive health care delivery for ASCVDs.
Collapse
|
4
|
Chaudhry AP, Hankey RA, Kaggal VC, Bhopalwala H, Liedl DA, Wennberg PW, Rooke TW, Scott CG, Disdier Moulder MP, Hendricks AK, Casanegra AI, McBane RD, Shellum JL, Kullo IJ, Nishimura RA, Chaudhry R, Arruda-Olson AM. Usability of a Digital Registry to Promote Secondary Prevention for Peripheral Artery Disease Patients. Mayo Clin Proc Innov Qual Outcomes 2021; 5:94-102. [PMID: 33718788 PMCID: PMC7930799 DOI: 10.1016/j.mayocpiqo.2020.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Objective To evaluate usability of a quality improvement tool that promotes guideline-based care for patients with peripheral arterial disease (PAD). Patients and Methods The study was conducted from July 19, 2018, to August 21, 2019. We compared the usability of a PAD cohort knowledge solution (CKS) with standard management supported by an electronic health record (EHR). Two scenarios were developed for usability evaluation; the first for the PAD-CKS while the second evaluated standard EHR workflow. Providers were asked to provide opinions about the PAD-CKS tool and to generate a System Usability Scale (SUS) score. Metrics analyzed included time required, number of mouse clicks, and number of keystrokes. Results Usability evaluations were completed by 11 providers. SUS for the PAD-CKS was excellent at 89.6. Time required to complete 21 tasks in the CKS was 4 minutes compared with 12 minutes for standard EHR workflow (median, P = .002). Completion of CKS tasks required 34 clicks compared with 148 clicks for the EHR (median, P = .002). Keystrokes for CKS task completion was 8 compared with 72 for EHR (median, P = .004). Providers indicated that overall they found the tool easy to use and the PAD mortality risk score useful. Conclusions Usability evaluation of the PAD-CKS tool demonstrated time savings, a high SUS score, and a reduction of mouse clicks and keystrokes for task completion compared to standard workflow using the EHR. Provider feedback regarding the strengths and weaknesses also created opportunities for iterative improvement of the PAD-CKS tool.
Collapse
Affiliation(s)
- Alisha P. Chaudhry
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Ronald A. Hankey
- Information Technology, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Vinod C. Kaggal
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Huzefa Bhopalwala
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - David A. Liedl
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Paul W. Wennberg
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Thom W. Rooke
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Christopher G. Scott
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN
| | | | - Abby K. Hendricks
- Department of Pharmacy, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Ana I. Casanegra
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Robert D. McBane
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Jane L. Shellum
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Iftikhar J. Kullo
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Rick A. Nishimura
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Rajeev Chaudhry
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic and Mayo Foundation, Rochester, MN
- Department of Internal Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
| | - Adelaide M. Arruda-Olson
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, MN
- Correspondence: Adelaide M. Arruda-Olson, MD, PhD, 200 First Street SW, Rochester, MN 55905
| |
Collapse
|
5
|
Wen A, Wang L, He H, Liu S, Fu S, Sohn S, Kugel JA, Kaggal VC, Huang M, Wang Y, Shen F, Fan J, Liu H. An aberration detection-based approach for sentinel syndromic surveillance of COVID-19 and other novel influenza-like illnesses. J Biomed Inform 2021; 113:103660. [PMID: 33321199 PMCID: PMC7832634 DOI: 10.1016/j.jbi.2020.103660] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 11/06/2020] [Accepted: 12/09/2020] [Indexed: 02/08/2023]
Abstract
Coronavirus Disease 2019 has emerged as a significant global concern, triggering harsh public health restrictions in a successful bid to curb its exponential growth. As discussion shifts towards relaxation of these restrictions, there is significant concern of second-wave resurgence. The key to managing these outbreaks is early detection and intervention, and yet there is a significant lag time associated with usage of laboratory confirmed cases for surveillance purposes. To address this, syndromic surveillance can be considered to provide a timelier alternative for first-line screening. Existing syndromic surveillance solutions are however typically focused around a known disease and have limited capability to distinguish between outbreaks of individual diseases sharing similar syndromes. This poses a challenge for surveillance of COVID-19 as its active periods tend to overlap temporally with other influenza-like illnesses. In this study we explore performing sentinel syndromic surveillance for COVID-19 and other influenza-like illnesses using a deep learning-based approach. Our methods are based on aberration detection utilizing autoencoders that leverages symptom prevalence distributions to distinguish outbreaks of two ongoing diseases that share similar syndromes, even if they occur concurrently. We first demonstrate that this approach works for detection of outbreaks of influenza, which has known temporal boundaries. We then demonstrate that the autoencoder can be trained to not alert on known and well-managed influenza-like illnesses such as the common cold and influenza. Finally, we applied our approach to 2019-2020 data in the context of a COVID-19 syndromic surveillance task to demonstrate how implementation of such a system could have provided early warning of an outbreak of a novel influenza-like illness that did not match the symptom prevalence profile of influenza and other known influenza-like illnesses.
Collapse
Affiliation(s)
- Andrew Wen
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Liwei Wang
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Huan He
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sijia Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sunyang Fu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sunghwan Sohn
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Jacob A Kugel
- Advanced Analytics Service Unit, Department of Information Technology, Mayo Clinic, Rochester, MN, USA
| | - Vinod C Kaggal
- Advanced Analytics Service Unit, Department of Information Technology, Mayo Clinic, Rochester, MN, USA
| | - Ming Huang
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Yanshan Wang
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Feichen Shen
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Jungwei Fan
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| | - Hongfang Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
6
|
Wen A, Wang L, He H, Liu S, Fu S, Sohn S, Kugel JA, Kaggal VC, Huang M, Wang Y, Shen F, Fan J, Liu H. An Aberration Detection-Based Approach for Sentinel Syndromic Surveillance of COVID-19 and Other Novel Influenza-Like Illnesses. medRxiv 2020:2020.06.08.20124990. [PMID: 32577704 PMCID: PMC7302403 DOI: 10.1101/2020.06.08.20124990] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Coronavirus Disease 2019 (COVID-19) has emerged as a significant global concern, triggering harsh public health restrictions in a successful bid to curb its exponential growth. As discussion shifts towards relaxation of these restrictions, there is significant concern of second-wave resurgence. The key to managing these outbreaks is early detection and intervention, and yet there is significant lag time associated with usage of laboratory confirmed cases for surveillance purposes. To address this, syndromic surveillance can be considered to provide a timelier alternative for first-line screening. Existing syndromic surveillance solutions are however typically focused around a known disease and have limited capability to distinguish between outbreaks of individual diseases sharing similar syndromes. This poses a challenge for surveillance of COVID-19 as its active periods are tend to overlap temporally with other influenza-like illnesses. In this study we explore performing sentinel syndromic surveillance for COVID-19 and other influenza-like illnesses using a deep learning-based approach. Our methods are based on aberration detection utilizing autoencoders that leverages symptom prevalence distributions to distinguish outbreaks of two ongoing diseases that share similar syndromes, even if they occur concurrently. We first demonstrate that this approach works for detection of outbreaks of influenza, which has known temporal boundaries. We then demonstrate that the autoencoder can be trained to not alert on known and well-managed influenza-like illnesses such as the common cold and influenza. Finally, we applied our approach to 2019-2020 data in the context of a COVID-19 syndromic surveillance task to demonstrate how implementation of such a system could have provided early warning of an outbreak of a novel influenza-like illness that did not match the symptom prevalence profile of influenza and other known influenza-like illnesses.
Collapse
Affiliation(s)
- Andrew Wen
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Liwei Wang
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Huan He
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sijia Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sunyang Fu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sunghwan Sohn
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Jacob A Kugel
- Advanced Analytics Service Unit, Department of Information Technology, Mayo Clinic, Rochester, MN USA
| | - Vinod C Kaggal
- Advanced Analytics Service Unit, Department of Information Technology, Mayo Clinic, Rochester, MN USA
| | - Ming Huang
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Yanshan Wang
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Feichen Shen
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Jungwei Fan
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Hongfang Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| |
Collapse
|
7
|
Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal VC, Liu S, Sohn S, Liu H, Fan J. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. NPJ Digit Med 2019; 2:130. [PMID: 31872069 PMCID: PMC6917754 DOI: 10.1038/s41746-019-0208-8] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 11/25/2019] [Indexed: 12/23/2022] Open
Abstract
Data is foundational to high-quality artificial intelligence (AI). Given that a substantial amount of clinically relevant information is embedded in unstructured data, natural language processing (NLP) plays an essential role in extracting valuable information that can benefit decision making, administration reporting, and research. Here, we share several desiderata pertaining to development and usage of NLP systems, derived from two decades of experience implementing clinical NLP at the Mayo Clinic, to inform the healthcare AI community. Using a framework, we developed as an example implementation, the desiderata emphasize the importance of a user-friendly platform, efficient collection of domain expert inputs, seamless integration with clinical data, and a highly scalable computing infrastructure.
Collapse
Affiliation(s)
- Andrew Wen
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sunyang Fu
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sungrim Moon
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Mohamed El Wazir
- 2Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN USA
| | - Andrew Rosenbaum
- 2Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN USA
| | - Vinod C Kaggal
- 3Advanced Analytics Service Unit, Department of Information Technology, Mayo Clinic, Rochester, MN USA
| | - Sijia Liu
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sunghwan Sohn
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Hongfang Liu
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Jungwei Fan
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| |
Collapse
|
8
|
Wen A, Wang Y, Kaggal VC, Liu S, Liu H, Fan J. Enhancing Clinical Information Retrieval through Context-Aware Queries and Indices. Proc IEEE Int Conf Big Data 2019; 2019:2800-2807. [PMID: 38213777 PMCID: PMC10782810 DOI: 10.1109/bigdata47090.2019.9006241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
The big data revolution has created a hefty demand for searching large-scale electronic health records (EHRs) to support clinical practice, research, and administration. Despite the volume of data involved, fast and accurate identification of clinical narratives pertinent to a clinical case being seen by any given provider is crucial for decision-making at the point of care. In the general domain, this capability is accomplished through a combination of the inverted index data structure, horizontal scaling, and information retrieval (IR) scoring algorithms. These technologies are also being used in the clinical domain, but have met limited success, particularly as clinical cases become more complex. One barrier affecting clinical performance is that contextual information, such as negation, temporality, and the subject of clinical mentions, impact clinical relevance but is not considered in general IR methodologies. In this study, we implemented a solution by identifying and incorporating the aforementioned semantic contexts as part of IR indexing/scoring with Elasticsearch. Experiments were conducted in comparison to baseline approaches with respect to: 1) evaluation of the impact on the quality (relevance) of the returned results, and 2) evaluation of the impact on execution time and storage requirements. The results showed a 5.1-23.1% improvement in retrieval quality, along with achieving 35% faster query execution time. Cost-wise, the solution required 1.5-2 times larger space and about 3 times increase in indexing time. The higher relevance demonstrated the merit of incorporating contextual information into clinical IR, and the near-constant increase in time and space suggested promising scalability.
Collapse
Affiliation(s)
- Andrew Wen
- Division of Digital Health Sciences Mayo Clinic, Rochester MN, USA
| | - Yanshan Wang
- Division of Digital Health Sciences Mayo Clinic, Rochester MN, USA
| | - Vinod C Kaggal
- Department of Information Technology Mayo Clinic, Rochester MN, USA
| | - Sijia Liu
- Division of Digital Health Sciences Mayo Clinic, Rochester MN, USA
| | - Hongfang Liu
- Division of Digital Health Sciences Mayo Clinic, Rochester MN, USA
| | - Jungwei Fan
- Division of Digital Health Sciences Mayo Clinic, Rochester MN, USA
| |
Collapse
|
9
|
Moussa Pacha H, Mallipeddi VP, Afzal N, Moon S, Kaggal VC, Kalra M, Oderich GS, Wennberg PW, Rooke TW, Scott CG, Kullo IJ, McBane RD, Nishimura RA, Chaudhry R, Liu H, Arruda-Olson AM. Association of Ankle-Brachial Indices With Limb Revascularization or Amputation in Patients With Peripheral Artery Disease. JAMA Netw Open 2018; 1:e185547. [PMID: 30646276 PMCID: PMC6324363 DOI: 10.1001/jamanetworkopen.2018.5547] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
IMPORTANCE The prevalence and morbidity of peripheral artery disease (PAD) are high, with limb outcomes including revascularization and amputation. In community-dwelling patients with PAD, the role of noninvasive evaluation for risk assessment and rates of limb outcomes have not been established to date. OBJECTIVE To evaluate whether ankle-brachial indices are associated with limb outcomes in community-dwelling patients with PAD. DESIGN, SETTING, AND PARTICIPANTS A population-based, observational, test-based cohort study of patients was performed from January 1, 1998, to December 31, 2014. Data analysis was conducted from July 15 to December 15, 2017. Participants included a community-based cohort of 1413 patients with PAD from Olmsted County, Minnesota, identified by validated algorithms deployed to electronic health records. Automated algorithms identified limb outcomes used to build Cox proportional hazards regression models. Ankle-brachial indices and presence of poorly compressible arteries were electronically identified from digital data sets. Guideline-recommended management strategies within 6 months of diagnosis were also electronically retrieved, including therapy with statins, antiplatelet agents, angiotensin-converting enzyme inhibitors or angiotensin-receptor blockers, and smoking abstention. MAIN OUTCOMES AND MEASURES Ankle-brachial index (index ≤0.9 indicates PAD; <.05, severe PAD; and ≥1.40, poorly compressible arteries) and limb revascularization or amputation. RESULTS Of 1413 patients, 633 (44.8%) were women; mean (SD) age was 70.8 (13.3) years. A total of 283 patients (20.0%) had severe PAD (ankle-brachial indices <0.5) and 350 (24.8%) had poorly compressible arteries (ankle-brachial indices ≥1.4); 780 (55.2%) individuals with less than severe disease formed the reference group. Only 32 of 283 patients (11.3%) with severe disease and 68 of 350 patients (19.4%) with poorly compressible arteries were receiving 4 guideline-recommended management strategies. In the severe disease subgroup, the 1-year event rate for revascularization was 32.4% (90 events); in individuals with poorly compressible arteries, the 1-year amputation rate was 13.9% (47 events). In models adjusted for age, sex, and critical limb ischemia, poorly compressible arteries were associated with amputation (hazard ratio [HR], 3.12; 95% CI, 2.16-4.50; P < .001) but not revascularization (HR, 0.91; 95% CI, 0.69-1.20; P = .49). In contrast, severe disease was associated with revascularization (HR, 2.69; 95% CI, 2.15-3.37; P < .001) but not amputation (HR, 1.30; 95% CI, 0.82-2.07; P = .27). CONCLUSIONS AND RELEVANCE Community-dwelling patients with severe PAD or poorly compressible arteries have high rates of revascularization or limb loss, respectively. Guideline-recommended management strategies for secondary risk prevention are underused in the community.
Collapse
Affiliation(s)
- Homam Moussa Pacha
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Vishnu P. Mallipeddi
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Naveed Afzal
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Sungrim Moon
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Vinod C. Kaggal
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Manju Kalra
- Division of Vascular Surgery, Department of Surgery, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Gustavo S. Oderich
- Division of Vascular Surgery, Department of Surgery, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Paul W. Wennberg
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Thom W. Rooke
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Christopher G. Scott
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Iftikhar J. Kullo
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Robert D. McBane
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Rick A. Nishimura
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Rajeev Chaudhry
- Division of Primary Care Medicine and Center of Translational Informatics and Knowledge Management, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | | |
Collapse
|
10
|
Arruda‐Olson AM, Afzal N, Priya Mallipeddi V, Said A, Moussa Pacha H, Moon S, Chaudhry AP, Scott CG, Bailey KR, Rooke TW, Wennberg PW, Kaggal VC, Oderich GS, Kullo IJ, Nishimura RA, Chaudhry R, Liu H. Leveraging the Electronic Health Record to Create an Automated Real-Time Prognostic Tool for Peripheral Arterial Disease. J Am Heart Assoc 2018; 7:e009680. [PMID: 30571601 PMCID: PMC6405562 DOI: 10.1161/jaha.118.009680] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 10/09/2018] [Indexed: 12/22/2022]
Abstract
Background Automated individualized risk prediction tools linked to electronic health records ( EHR s) are not available for management of patients with peripheral arterial disease. The goal of this study was to create a prognostic tool for patients with peripheral arterial disease using data elements automatically extracted from an EHR to enable real-time and individualized risk prediction at the point of care. Methods and Results A previously validated phenotyping algorithm was deployed to an EHR linked to the Rochester Epidemiology Project to identify peripheral arterial disease cases from Olmsted County, MN, for the years 1998 to 2011. The study cohort was composed of 1676 patients: 593 patients died over 5-year follow-up. The c-statistic for survival in the overall data set was 0.76 (95% confidence interval [CI], 0.74-0.78), and the c-statistic across 10 cross-validation data sets was 0.75 (95% CI, 0.73-0.77). Stratification of cases demonstrated increasing mortality risk by subgroup (low: hazard ratio, 0.35 [95% CI, 0.21-0.58]; intermediate-high: hazard ratio, 2.98 [95% CI, 2.37-3.74]; high: hazard ratio, 8.44 [95% CI, 6.66-10.70], all P<0.0001 versus the reference subgroup). An equation for risk calculation was derived from Cox model parameters and β estimates. Big data infrastructure enabled deployment of the real-time risk calculator to the point of care via the EHR . Conclusions This study demonstrates that electronic tools can be deployed to EHR s to create automated real-time risk calculators to predict survival of patients with peripheral arterial disease. Moreover, the prognostic model developed may be translated to patient care as an automated and individualized real-time risk calculator deployed at the point of care.
Collapse
Affiliation(s)
| | - Naveed Afzal
- Department of Health Sciences ResearchMayo ClinicRochesterMN
| | | | - Ahmad Said
- Department of Cardiovascular MedicineMayo ClinicRochesterMN
| | | | - Sungrim Moon
- Department of Health Sciences ResearchMayo ClinicRochesterMN
| | | | | | - Kent R. Bailey
- Department of Health Sciences ResearchMayo ClinicRochesterMN
| | - Thom W. Rooke
- Department of Cardiovascular MedicineMayo ClinicRochesterMN
| | | | - Vinod C. Kaggal
- Department of Health Sciences ResearchMayo ClinicRochesterMN
| | | | | | | | - Rajeev Chaudhry
- Division of Primary Care Medicine and Center of Translational Informatics and Knowledge ManagementMayo ClinicRochesterMN
| | - Hongfang Liu
- Department of Health Sciences ResearchMayo ClinicRochesterMN
| |
Collapse
|
11
|
Kaggal VC, Elayavilli RK, Mehrabi S, Pankratz JJ, Sohn S, Wang Y, Li D, Rastegar MM, Murphy SP, Ross JL, Chaudhry R, Buntrock JD, Liu H. Toward a Learning Health-care System - Knowledge Delivery at the Point of Care Empowered by Big Data and NLP. Biomed Inform Insights 2016; 8:13-22. [PMID: 27385912 PMCID: PMC4920204 DOI: 10.4137/bii.s37977] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Revised: 03/20/2016] [Accepted: 03/29/2016] [Indexed: 11/24/2022]
Abstract
The concept of optimizing health care by understanding and generating knowledge from previous evidence, ie, the Learning Health-care System (LHS), has gained momentum and now has national prominence. Meanwhile, the rapid adoption of electronic health records (EHRs) enables the data collection required to form the basis for facilitating LHS. A prerequisite for using EHR data within the LHS is an infrastructure that enables access to EHR data longitudinally for health-care analytics and real time for knowledge delivery. Additionally, significant clinical information is embedded in the free text, making natural language processing (NLP) an essential component in implementing an LHS. Herein, we share our institutional implementation of a big data-empowered clinical NLP infrastructure, which not only enables health-care analytics but also has real-time NLP processing capability. The infrastructure has been utilized for multiple institutional projects including the MayoExpertAdvisor, an individualized care recommendation solution for clinical care. We compared the advantages of big data over two other environments. Big data infrastructure significantly outperformed other infrastructure in terms of computing speed, demonstrating its value in making the LHS a possibility in the near future.
Collapse
Affiliation(s)
- Vinod C Kaggal
- Division of Information Management and Analytics, Mayo Clinic, Rochester, MN, USA.; Biomedical Informatics and Computational Biology, University of Minnesota, Rochester, MN, USA
| | | | - Saeed Mehrabi
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Joshua J Pankratz
- Division of Information Management and Analytics, Mayo Clinic, Rochester, MN, USA
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Dingcheng Li
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | | | - Sean P Murphy
- Division of Information Management and Analytics, Mayo Clinic, Rochester, MN, USA
| | - Jason L Ross
- Division of Information Management and Analytics, Mayo Clinic, Rochester, MN, USA
| | | | - James D Buntrock
- Division of Information Management and Analytics, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
12
|
Oniki TA, Zhuo N, Beebe CE, Liu H, Coyle JF, Parker CG, Solbrig HR, Marchant K, Kaggal VC, Chute CG, Huff SM. Clinical element models in the SHARPn consortium. J Am Med Inform Assoc 2016; 23:248-56. [PMID: 26568604 PMCID: PMC6283078 DOI: 10.1093/jamia/ocv134] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Revised: 03/20/2015] [Accepted: 04/18/2015] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE The objective of the Strategic Health IT Advanced Research Project area four (SHARPn) was to develop open-source tools that could be used for the normalization of electronic health record (EHR) data for secondary use--specifically, for high throughput phenotyping. We describe the role of Intermountain Healthcare's Clinical Element Models ([CEMs] Intermountain Healthcare Health Services, Inc, Salt Lake City, Utah) as normalization "targets" within the project. MATERIALS AND METHODS Intermountain's CEMs were either repurposed or created for the SHARPn project. A CEM describes "valid" structure and semantics for a particular kind of clinical data. CEMs are expressed in a computable syntax that can be compiled into implementation artifacts. The modeling team and SHARPn colleagues agilely gathered requirements and developed and refined models. RESULTS Twenty-eight "statement" models (analogous to "classes") and numerous "component" CEMs and their associated terminology were repurposed or developed to satisfy SHARPn high throughput phenotyping requirements. Model (structural) mappings and terminology (semantic) mappings were also created. Source data instances were normalized to CEM-conformant data and stored in CEM instance databases. A model browser and request site were built to facilitate the development. DISCUSSION The modeling efforts demonstrated the need to address context differences and granularity choices and highlighted the inevitability of iso-semantic models. The need for content expertise and "intelligent" content tooling was also underscored. We discuss scalability and sustainability expectations for a CEM-based approach and describe the place of CEMs relative to other current efforts. CONCLUSIONS The SHARPn effort demonstrated the normalization and secondary use of EHR data. CEMs proved capable of capturing data originating from a variety of sources within the normalization pipeline and serving as suitable normalization targets.
Collapse
Affiliation(s)
- Thomas A Oniki
- Department of Medical Informatics, Intermountain Healthcare, Salt Lake City, Utah, USA
| | - Ning Zhuo
- Department of Medical Informatics, Intermountain Healthcare, Salt Lake City, Utah, USA
| | - Calvin E Beebe
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Joseph F Coyle
- Department of Medical Informatics, Intermountain Healthcare, Salt Lake City, Utah, USA
| | - Craig G Parker
- Department of Medical Informatics, Intermountain Healthcare, Salt Lake City, Utah, USA
| | - Harold R Solbrig
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | | | - Vinod C Kaggal
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Christopher G Chute
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Stanley M Huff
- Department of Medical Informatics, Intermountain Healthcare, Salt Lake City, Utah, USA Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA
| |
Collapse
|
13
|
Pathak J, Bailey KR, Beebe CE, Bethard S, Carrell DS, Chen PJ, Dligach D, Endle CM, Hart LA, Haug PJ, Huff SM, Kaggal VC, Li D, Liu H, Marchant K, Masanz J, Miller T, Oniki TA, Palmer M, Peterson KJ, Rea S, Savova GK, Stancl CR, Sohn S, Solbrig HR, Suesse DB, Tao C, Taylor DP, Westberg L, Wu S, Zhuo N, Chute CG. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium. J Am Med Inform Assoc 2013; 20:e341-8. [PMID: 24190931 PMCID: PMC3861933 DOI: 10.1136/amiajnl-2013-001939] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Revised: 10/07/2013] [Accepted: 10/11/2013] [Indexed: 11/03/2022] Open
Abstract
RESEARCH OBJECTIVE To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. MATERIALS AND METHODS Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems-Mayo Clinic and Intermountain Healthcare-were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. RESULTS Using CEMs and open-source natural language processing and terminology services engines-namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)-we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and 18 patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. CONCLUSIONS End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.
Collapse
Affiliation(s)
- Jyotishman Pathak
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Kent R Bailey
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Calvin E Beebe
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Steven Bethard
- Department of Linguistics, University of Colorado, Boulder, Colorado, USA
| | | | - Pei J Chen
- Boston Children's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Dmitriy Dligach
- Boston Children's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Cory M Endle
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Lacey A Hart
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Peter J Haug
- Homer Warner Center for Informatics Research, Intermountain Healthcare, Salt Lake City, Utah, USA
| | - Stanley M Huff
- Homer Warner Center for Informatics Research, Intermountain Healthcare, Salt Lake City, Utah, USA
| | - Vinod C Kaggal
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Dingcheng Li
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | | | - James Masanz
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Timothy Miller
- Boston Children's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Thomas A Oniki
- Homer Warner Center for Informatics Research, Intermountain Healthcare, Salt Lake City, Utah, USA
| | - Martha Palmer
- Department of Linguistics, University of Colorado, Boulder, Colorado, USA
| | - Kevin J Peterson
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Susan Rea
- Homer Warner Center for Informatics Research, Intermountain Healthcare, Salt Lake City, Utah, USA
| | - Guergana K Savova
- Boston Children's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Craig R Stancl
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Harold R Solbrig
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Dale B Suesse
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Sciences Center, Houston, Texas, USA
| | - David P Taylor
- Homer Warner Center for Informatics Research, Intermountain Healthcare, Salt Lake City, Utah, USA
| | | | - Stephen Wu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Ning Zhuo
- Homer Warner Center for Informatics Research, Intermountain Healthcare, Salt Lake City, Utah, USA
| | - Christopher G Chute
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
14
|
Abstract
Health disparities and solutions are heterogeneous within and among racial and ethnic groups, yet existing administrative databases lack the granularity to reflect important sociocultural distinctions. We measured the efficacy of a natural-language-processing algorithm to identify a specific immigrant group. The algorithm demonstrated accuracy and precision in identifying Somali patients from the electronic medical records at a single institution. This technology holds promise to identify and track immigrants and refugees in the United States in local health care settings.
Collapse
Affiliation(s)
- Mark L Wieland
- Division of Primary Care Internal Medicine, Mayo Clinic, Rochester, MN 55904, USA.
| | | | | | | |
Collapse
|
15
|
Wu ST, Kaggal VC, Dligach D, Masanz JJ, Chen P, Becker L, Chapman WW, Savova GK, Liu H, Chute CG. A common type system for clinical natural language processing. J Biomed Semantics 2013; 4:1. [PMID: 23286462 PMCID: PMC3575354 DOI: 10.1186/2041-1480-4-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 12/23/2012] [Indexed: 11/29/2022] Open
Abstract
Background One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. Results We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. Conclusions We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.
Collapse
|