1
|
Abrams M, Wong A, Kholti HE, Chung Y, Armitige L, Wang D. Development of a Study Protocol for Evaluation of a Novel Measure to Incorporate Information Freshness into Network Analysis of Online Resources for COVID-19. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:27-35. [PMID: 38827115 PMCID: PMC11141794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
We proposed a novel measure, Degree of Connectivity with Integration of Freshness (DCIF), to incorporate information freshness into analysis of online resource networks. We conducted a pilot study to apply this new measure to a dataset of online information resources related to COVID-19 risk assessment. Among the 52 nodes, we recorded statistically significant difference between the numerical values of DCIF and the traditional structural measure Degree of Connectivity (DC). Manual reviews of 18 selected nodes showed that DCIF outperformed DC in 11 of them, suggesting potential promise of the proposed new measure. We finalized the protocol for manual review based on the pilot and started a full-scale study. The proposed new measure has the potential to provide quantitative assessment on information freshness for timely and effective dissemination of clinical evidence. Further research is required to address the limitations of this pilot study and to examine the generalization of the findings.
Collapse
|
2
|
Amarbayasgalan T, Ryu KH. Unsupervised Feature-Construction-Based Motor Fault Diagnosis. SENSORS (BASEL, SWITZERLAND) 2024; 24:2978. [PMID: 38793833 PMCID: PMC11125213 DOI: 10.3390/s24102978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 05/05/2024] [Accepted: 05/06/2024] [Indexed: 05/26/2024]
Abstract
Any bearing faults are a leading cause of motor damage and bring economic losses. Fast and accurate identification of bearing faults is valuable for preventing damaging the whole equipment and continuously running industrial processes without interruption. Vibration signals from a running motor can be utilized to diagnose a bearing health condition. This study proposes a detection method for bearing faults based on two types of neural networks from motor vibration data. The proposed method uses an autoencoder neural network for constructing a new motor vibration feature and a feed-forward neural network for the final detection. The constructed signal feature enhances the prediction performance by focusing more on a fault type that is difficult to detect. We conducted experiments on the CWRU bearing datasets. The experimental study shows that the proposed method improves the performance of the feed-forward neural network and outperforms the other machine learning algorithms.
Collapse
Affiliation(s)
| | - Keun Ho Ryu
- Data Science Laboratory, Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam
| |
Collapse
|
3
|
Yu D, Stidham RW, Vydiswaran VGV. A Systematic Temporal Extraction Pipeline for Medical Concepts in Clinical Notes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:1314-1323. [PMID: 38222360 PMCID: PMC10785919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
With increased application of natural language processing (NLP) in medicine, many NLP models are being developed for uncovering relevant clinical features from electronic health records. Temporal information plays a key role in understanding the context, significance, and interpretation of medical concepts extracted from clinical notes. This is particularly true in situations where the behavior, value, or status of a medical concept changes over time. In this paper, we introduce a systematic framework, NLP annotation-Relaxation-Generation (NRG). NRG compiles incidents of medical concept changes from status annotations and timestamps of multiple clinical notes. We demonstrate the effectiveness of the NRG pipeline by applying it to two medical concepts related to patients with inflammatory bowel disease: extra-intestinal manifestations and medications. We show that the NRG pipeline offers not only insights into medical concept changes over time, but can help convey longitudinal changes in clinical features at both individual and population level.
Collapse
Affiliation(s)
- Deahan Yu
- University of Michigan, Ann Arbor, MI, USA
| | | | | |
Collapse
|
4
|
Keszthelyi D, Gaudet-Blavignac C, Bjelogrlic M, Lovis C. Patient Information Summarization in Clinical Settings: Scoping Review. JMIR Med Inform 2023; 11:e44639. [PMID: 38015588 DOI: 10.2196/44639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 03/15/2023] [Accepted: 07/25/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Information overflow, a common problem in the present clinical environment, can be mitigated by summarizing clinical data. Although there are several solutions for clinical summarization, there is a lack of a complete overview of the research relevant to this field. OBJECTIVE This study aims to identify state-of-the-art solutions for clinical summarization, to analyze their capabilities, and to identify their properties. METHODS A scoping review of articles published between 2005 and 2022 was conducted. With a clinical focus, PubMed and Web of Science were queried to find an initial set of reports, later extended by articles found through a chain of citations. The included reports were analyzed to answer the questions of where, what, and how medical information is summarized; whether summarization conserves temporality, uncertainty, and medical pertinence; and how the propositions are evaluated and deployed. To answer how information is summarized, methods were compared through a new framework "collect-synthesize-communicate" referring to information gathering from data, its synthesis, and communication to the end user. RESULTS Overall, 128 articles were included, representing various medical fields. Exclusively structured data were used as input in 46.1% (59/128) of papers, text in 41.4% (53/128) of articles, and both in 10.2% (13/128) of papers. Using the proposed framework, 42.2% (54/128) of the records contributed to information collection, 27.3% (35/128) contributed to information synthesis, and 46.1% (59/128) presented solutions for summary communication. Numerous summarization approaches have been presented, including extractive (n=13) and abstractive summarization (n=19); topic modeling (n=5); summary specification (n=11); concept and relation extraction (n=30); visual design considerations (n=59); and complete pipelines (n=7) using information extraction, synthesis, and communication. Graphical displays (n=53), short texts (n=41), static reports (n=7), and problem-oriented views (n=7) were the most common types in terms of summary communication. Although temporality and uncertainty information were usually not conserved in most studies (74/128, 57.8% and 113/128, 88.3%, respectively), some studies presented solutions to treat this information. Overall, 115 (89.8%) articles showed results of an evaluation, and methods included evaluations with human participants (median 15, IQR 24 participants): measurements in experiments with human participants (n=31), real situations (n=8), and usability studies (n=28). Methods without human involvement included intrinsic evaluation (n=24), performance on a proxy (n=10), or domain-specific tasks (n=11). Overall, 11 (8.6%) reports described a system deployed in clinical settings. CONCLUSIONS The scientific literature contains many propositions for summarizing patient information but reports very few comparisons of these proposals. This work proposes to compare these algorithms through how they conserve essential aspects of clinical information and through the "collect-synthesize-communicate" framework. We found that current propositions usually address these 3 steps only partially. Moreover, they conserve and use temporality, uncertainty, and pertinent medical aspects to varying extents, and solutions are often preliminary.
Collapse
Affiliation(s)
- Daniel Keszthelyi
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christophe Gaudet-Blavignac
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Mina Bjelogrlic
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| |
Collapse
|
5
|
Yang S, Dong M, Wang Y, Xu C. Adversarial Recurrent Time Series Imputation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1639-1650. [PMID: 32749970 DOI: 10.1109/tnnls.2020.3010524] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
For the real-world time series analysis, data missing is a ubiquitously existing problem due to anomalies during data collecting and storage. If not treated properly, this problem will seriously hinder the classification, regression, or related tasks. Existing methods for time series imputation either impose too strong assumptions on the distribution of missing data or cannot fully exploit, even simply ignore, the informative temporal dependencies and feature correlations across different time steps. In this article, inspired by the idea of conditional generative adversarial networks, we propose a generative adversarial learning framework for time series imputation under the condition of observed data (as well as the labels, if possible). In our model, we employ a modified bidirectional RNN structure as the generator G, which is aimed at generating the missing values by taking advantage of the temporal and nontemporal information extracted from the observed time series. The discriminator D is designed to distinguish whether each value in a time series is generated or not so that it can help the generator to make an adjustment toward a more authentic imputation result. For an empirical verification of our model, we conduct imputation and classification experiments on several real-world time series data sets. The experimental results show an eminent improvement compared with state-of-the-art baseline models.
Collapse
|
6
|
Wang Q, Chen G, Jin X, Ren S, Wang G, Cao L, Xia Y. BiT-MAC: Mortality prediction by bidirectional time and multi-feature attention coupled network on multivariate irregular time series. Comput Biol Med 2023; 155:106586. [PMID: 36774888 DOI: 10.1016/j.compbiomed.2023.106586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 12/28/2022] [Accepted: 01/22/2023] [Indexed: 02/04/2023]
Abstract
Mortality prediction is crucial to evaluate the severity of illness and assist in improving the prognosis of patients. In clinical settings, one way is to analyze the multivariate time series (MTSs) of patients based on their medical data, such as heart rates and invasive mean arterial blood pressure. However, this suffers from sparse, irregularly sampled, and incomplete data issues. These issues can compromise the performance of follow-up MTS-based analytic applications. Plenty of existing methods try to deal with such irregular MTSs with missing values by capturing the temporal dependencies within a time series, yet in-depth research on modeling inter-MTS couplings remains rare and lacks model interpretability. To this end, we propose a bidirectional time and multi-feature attention coupled network (BiT-MAC) to capture the temporal dependencies (i.e., intra-time series coupling) and the hidden relationships among variables (i.e., inter-time series coupling) with a bidirectional recurrent neural network and multi-head attention, respectively. The resulting intra- and inter-time series coupling representations are then fused to estimate the missing values for a more robust MTS-based prediction. We evaluate BiT-MAC by applying it to the missing-data corrupted mortality prediction on two real-world clinical datasets, i.e., PhysioNet'2012 and COVID-19. Extensive experiments demonstrate the superiority of BiT-MAC over cutting-edge models, verifying the great value of the deep and hidden relations captured by MTSs. The interpretability of features is further demonstrated through a case study.
Collapse
Affiliation(s)
- Qinfen Wang
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Geng Chen
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Xuting Jin
- Department of Critical Care Medicine, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710049, China
| | - Siyuan Ren
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Gang Wang
- Department of Critical Care Medicine, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710049, China
| | - Longbing Cao
- Engineering and IT, University of Technology Sydney, Sydney, 2007, Australia
| | - Yong Xia
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, 710072, China.
| |
Collapse
|
7
|
Olex AL, McInnes BT. Temporal disambiguation of relative temporal expressions in clinical texts. Front Res Metr Anal 2022; 7:1001266. [PMID: 36352893 PMCID: PMC9638055 DOI: 10.3389/frma.2022.1001266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 10/03/2022] [Indexed: 06/01/2024] Open
Abstract
Temporal expression recognition and normalization (TERN) is the foundation for all higher-level temporal reasoning tasks in natural language processing, such as timeline extraction, so it must be performed well to limit error propagation. Achieving new heights in state-of-the-art performance for TERN in clinical texts requires knowledge of where current systems struggle. In this work, we summarize the results of a detailed error analysis for three top performing state-of-the-art TERN systems that participated in the 2012 i2b2 Clinical Temporal Relation Challenge, and compare our own home-grown system Chrono to identify specific areas in need of improvement. Performance metrics and an error analysis reveal that all systems have reduced performance in normalization of relative temporal expressions, specifically in disambiguating temporal types and in the identification of the correct anchor time. To address the issue of temporal disambiguation we developed and integrated a module into Chrono that utilizes temporally fine-tuned contextual word embeddings to disambiguate relative temporal expressions. Chrono now achieves state-of-the-art performance for temporal disambiguation of relative temporal expressions in clinical text, and is the only TERN system to output dual annotations into both TimeML and SCATE schemes.
Collapse
Affiliation(s)
- Amy L. Olex
- C. Kenneth and Diane Wright Center for Clinical and Translational Research, Virginia Commonwealth University, Richmond, VA, United States
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bridget T. McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
8
|
|
9
|
Alfattni G, Peek N, Nenadic G. Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries. J Biomed Inform 2021; 123:103915. [PMID: 34600144 DOI: 10.1016/j.jbi.2021.103915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 08/05/2021] [Accepted: 09/09/2021] [Indexed: 10/20/2022]
Abstract
Temporal relation extraction between health-related events is a widely studied task in clinical Natural Language Processing (NLP). The current state-of-the-art methods mostly rely on engineered features (i.e., rule-based modelling) and sequence modelling, which often encodes a source sentence into a single fixed-length context. An obvious disadvantage of this fixed-length context design is its incapability to model longer sentences, as important temporal information in the clinical text may appear at different positions. To address this issue, we propose an Attention-based Bidirectional Long Short-Term Memory (Att-BiLSTM) model to enable learning the important semantic information in long source text segments and to better determine which parts of the text are most important. We experimented with two embeddings and compared the performances to traditional state-of-the-art methods that require elaborate linguistic pre-processing and hand-engineered features. The experimental results on the i2b2 2012 temporal relation test corpus show that the proposed method achieves a significant improvement with an F-score of 0.811, which is at least 10% better than state-of-the-art in the field. We show that the model can be remarkably effective at classifying temporal relations when provided with word embeddings trained on corpora in a general domain. Finally, we perform an error analysis to gain insight into the common errors made by the model.
Collapse
Affiliation(s)
- Ghada Alfattni
- Department of Computer Science, University of Manchester, Manchester, UK; Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia.
| | - Niels Peek
- Centre for Health Informatics, Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, UK; National Institute of Health Research Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| |
Collapse
|
10
|
Locke S, Bashall A, Al-Adely S, Moore J, Wilson A, Kitchen GB. Natural language processing in medicine: A review. TRENDS IN ANAESTHESIA AND CRITICAL CARE 2021. [DOI: 10.1016/j.tacc.2021.02.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
11
|
Olex AL, McInnes BT. Review of Temporal Reasoning in the Clinical Domain for Timeline Extraction: Where we are and where we need to be. J Biomed Inform 2021; 118:103784. [PMID: 33862232 DOI: 10.1016/j.jbi.2021.103784] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 03/07/2021] [Accepted: 04/08/2021] [Indexed: 11/16/2022]
Abstract
Understanding a patient's medical history, such as how long symptoms last or when a procedure was performed, is vital to diagnosing problems and providing good care. Frequently, important information regarding a patient's medical timeline is buried in their Electronic Health Record (EHR) in the form of unstructured clinical notes. This results in care providers spending time reading notes in a patient's record in order to become familiar with their condition prior to developing a diagnosis or treatment plan. Valuable time could be saved if this information was readily accessible for searching and visualization for fast comprehension by the medical team. Clinical Natural Language Processing (NLP) is an area of research that aims to build computational methods to automatically extract medically relevant information from unstructured clinical texts. A key component of Clinical NLP is Temporal Reasoning, as understanding a patient's medical history relies heavily on the ability to identify, assimilate, and reason over temporal information. In this work, we review the current state of Temporal Reasoning in the clinical domain with respect to Clinical Timeline Extraction. While much progress has been made, the current state-of-the-art still has a ways to go before practical application in the clinical setting will be possible. Areas such as handling relative and implicit temporal expressions, both in normalization and in identifying temporal relationships, improving co-reference resolution, and building inter-operable timeline extraction tools that can integrate multiple types of data are in need of new and innovative solutions to improve performance on clinical data.
Collapse
Affiliation(s)
- Amy L Olex
- Virginia Commonwealth University, 401 S. Main St., Richmond, VA 23284, USA.
| | - Bridget T McInnes
- Virginia Commonwealth University, 401 S. Main St., Richmond, VA 23284, USA
| |
Collapse
|
12
|
Zhu X, Plasek JM, Tang C, Al-Assad W, Zhang Z, Xiong Y, Wang L, Yerneni S, Ortega C, Kang MJ, Zhou L, Bates DW, Dykes PC. Embedding, aligning and reconstructing clinical notes to explore sepsis. BMC Res Notes 2021; 14:136. [PMID: 33853664 PMCID: PMC8048212 DOI: 10.1186/s13104-021-05529-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/13/2021] [Indexed: 11/24/2022] Open
Abstract
Objective Our goal was to research and develop exploratory analysis tools for clinical notes, which now are underrepresented to limit the diversity of data insights on medically relevant applications. Results We characterize how exploratory analysis can affect representation learning on clinical narratives and present several self-developed tools to explore sepsis. Our experiments focus on patients with sepsis in the MIMIC-III Clinical Database or in our institution’s research patient data repository. We found that global embeddings assist in learning local representations of clinical notes. Second, aligning at any specific time facilitates the use of learning models by pooling more available clinical notes to form a training set. Furthermore, reconstruction of the timeline enhances downstream-processing techniques by emphasizing temporal expressions and temporal relationships in clinical documentation. We demonstrate that clustering helps plot various types of clinical notes against a scale, which conveys a sense of the range or spread of the data and is useful for understanding data correlations. Appropriate exploratory analysis tools provide keen insights into preprocessing clinical notes, thereby further enhancing downstream analysis capabilities, making data driven medicine possible. Our examples can help generate better data representation of clinical documentation for models with improved performance and interpretability. Supplementary Information The online version contains supplementary material available at 10.1186/s13104-021-05529-4.
Collapse
Affiliation(s)
- Xudong Zhu
- Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China
| | - Joseph M Plasek
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Chunlei Tang
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Clinical and Quality Analysis, Mass General Brigham, Boston, MA, USA
| | - Wasim Al-Assad
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Zhikun Zhang
- Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China
| | - Yun Xiong
- Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China
| | - Liqin Wang
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Sharmitha Yerneni
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Carlos Ortega
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Min-Jeoung Kang
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,College of Nursing, The Catholic University of Korea, 222 Banpo-daero, Seocho-gu, Seoul, 06591, South Korea.
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - David W Bates
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Clinical and Quality Analysis, Mass General Brigham, Boston, MA, USA
| | - Patricia C Dykes
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
13
|
Alfattni G, Peek N, Nenadic G. Extraction of temporal relations from clinical free text: A systematic review of current approaches. J Biomed Inform 2020; 108:103488. [PMID: 32673788 DOI: 10.1016/j.jbi.2020.103488] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 06/10/2020] [Accepted: 06/15/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND Temporal relations between clinical events play an important role in clinical assessment and decision making. Extracting such relations from free text data is a challenging task because it lies on between medical natural language processing, temporal representation and temporal reasoning. OBJECTIVES To survey existing methods for extracting temporal relations (TLINKs) between events from clinical free text in English; to establish the state-of-the-art in this field; and to identify outstanding methodological challenges. METHODS A systematic search in PubMed and the DBLP computer science bibliography was conducted for studies published between January 2006 and December 2018. The relevant studies were identified by examining the titles and abstracts. Then, the full text of selected studies was analyzed in depth and information were collected on TLINK tasks, TLINK types, data sources, features selection, methods used, and reported performance. RESULTS A total of 2834 publications were identified for title and abstract screening. Of these publications, 51 studies were selected. Thirty-two studies used machine learning approaches, 15 studies used a hybrid approaches, and only four studies used a rule-based approach. The majority of studies use publicly available corpora: THYME (28 studies) and the i2b2 corpus (17 studies). CONCLUSION The performance of TLINK extraction methods ranges widely depending on relation types and events (e.g. from 32% to 87% F-score for identifying relations between clinical events and document creation time). A small set of TLINKs (before, after, overlap and contains) has been widely studied with relatively good performance, whereas other types of TLINK (e.g., started by, finished by, precedes) are rarely studied and remain challenging. Machine learning classifiers (such as Support Vector Machine and Conditional Random Fields) and Deep Neural Networks were among the best performing methods for extracting TLINKs, but nearly all the work has been carried out and tested on two publicly available corpora only. The field would benefit from the availability of more publicly available, high-quality, annotated clinical text corpora.
Collapse
Affiliation(s)
- Ghada Alfattni
- Department of Computer Science, University of Manchester, Manchester, UK; Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia.
| | - Niels Peek
- Centre for Health Informatics, Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, UK; National Institute of Health Research Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| |
Collapse
|
14
|
Gaebel J, Wu HG, Oeser A, Cypko MA, Stoehr M, Dietz A, Neumuth T, Franke S, Oeltze-Jafra S. Modeling and processing up-to-dateness of patient information in probabilistic therapy decision support. Artif Intell Med 2020; 104:101842. [PMID: 32499009 DOI: 10.1016/j.artmed.2020.101842] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 02/05/2020] [Accepted: 03/04/2020] [Indexed: 11/28/2022]
Abstract
OBJECTIVES Probabilistic modeling of a patient's situation with the goal of providing calculated therapy recommendations can improve the decision making of interdisciplinary teams. Relevant information entities and direct causal dependencies, as well as uncertainty, must be formally described. Possible therapy options, tailored to the patient, can be inferred from the clinical data using these descriptions. However, there are several avoidable factors of uncertainty influencing the accuracy of the inference. For instance, inaccuracy may emerge from outdated information. In general, probabilistic models, e.g. Bayesian Networks can depict the causality and relations of individual information entities, but in general cannot evaluate individual entities concerning their up-to-dateness. The goal of the work at hand is to model diagnostic up-to-dateness, which can reasonably adjust the influence of outdated diagnostic information to improve the inference results of clinical decision models. METHODS AND MATERIALS We analyzed 68 laryngeal cancer cases and modeled the state of up-to-dateness of different diagnostic modalities. All cases were used for cross-validation. 55 cases were used to train the model, 13 for testing. Each diagnostic procedure involved in the decision making process of these cases was associated with a specific threshold for the time the information is considered up-to-date, i.e. reliable. Based on this threshold, outdated findings could be identified and their impact on probabilistic calculations could be reduced. We applied the model for reducing the weight of outdated patient data in the computation of TNM stagings for the 13 test cases and compared the results to the manually derived TNM stagings in the patient files. RESULTS With the implementation of these weights in the laryngeal cancer model, we increased the accuracy of the TNM calculation from 0.61 (8 out of 13 cases correct) to 0.76 (10 out of 13 cases correct). CONCLUSION Decision delay may cause specific patient data to be outdated. This can cause contradictory or false information and impair calculations for clinical decision support. Our approach demonstrates that the accuracy of Bayesian Network models can be improved when pre-processing the patient-specific data and evaluating their up-to-dateness with reduced weights on outdated information.
Collapse
Affiliation(s)
- Jan Gaebel
- University of Leipzig, Medical Faculty, ICCAS, Leipzig, Germany.
| | - Hans-Georg Wu
- University of Leipzig, Medical Faculty, ICCAS, Leipzig, Germany
| | - Alexander Oeser
- University of Leipzig, Medical Faculty, ICCAS, Leipzig, Germany
| | - Mario A Cypko
- University of Leipzig, Medical Faculty, ICCAS, Leipzig, Germany
| | - Matthaeus Stoehr
- University Hospital Leipzig, Dept. of Otolaryngology, Head and Neck Surgery, Leipzig, Germany
| | - Andreas Dietz
- University of Leipzig, Medical Faculty, ICCAS, Leipzig, Germany; University Hospital Leipzig, Dept. of Otolaryngology, Head and Neck Surgery, Leipzig, Germany
| | - Thomas Neumuth
- University of Leipzig, Medical Faculty, ICCAS, Leipzig, Germany
| | - Stefan Franke
- University of Leipzig, Medical Faculty, ICCAS, Leipzig, Germany
| | - Steffen Oeltze-Jafra
- University of Leipzig, Medical Faculty, ICCAS, Leipzig, Germany; Department of Neurology, University of Magdeburg, Germany
| |
Collapse
|
15
|
Najafabadipour M, Zanin M, Rodríguez-González A, Torrente M, Nuñez García B, Cruz Bermudez JL, Provencio M, Menasalvas E. Reconstructing the patient's natural history from electronic health records. Artif Intell Med 2020; 105:101860. [PMID: 32505419 DOI: 10.1016/j.artmed.2020.101860] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 04/06/2020] [Accepted: 04/06/2020] [Indexed: 10/24/2022]
Abstract
The automatic extraction of a patient's natural history from Electronic Health Records (EHRs) is a critical step towards building intelligent systems that can reason about clinical variables and support decision making. Although EHRs contain a large amount of valuable information about the patient's medical care, this information can only be fully understood when analyzed in a temporal context. Any intelligent system should then be able to extract medical concepts, date expressions, temporal relations and the temporal ordering of medical events from the free texts of EHRs; yet, this task is hard to tackle, due to the domain specific nature of EHRs, writing quality and lack of structure of these texts, and more generally the presence of redundant information. In this paper, we introduce a new Natural Language Processing (NLP) framework, capable of extracting the aforementioned elements from EHRs written in Spanish using rule-based methods. We focus on building medical timelines, which include disease diagnosis and its progression over time. By using a large dataset of EHRs comprising information about patients suffering from lung cancer, we show that our framework has an adequate level of performance by correctly building the timeline for 843 patients from a pool of 989 patients, achieving a precision of 0.852.
Collapse
Affiliation(s)
- Marjan Najafabadipour
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain; Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
| | - Massimiliano Zanin
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain.
| | | | - Maria Torrente
- Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
| | | | | | | | - Ernestina Menasalvas
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain.
| |
Collapse
|
16
|
Liu D, Wu YL, Li X, Qi L. Medi-Care AI: Predicting medications from billing codes via robust recurrent neural networks. Neural Netw 2020; 124:109-116. [PMID: 31991306 DOI: 10.1016/j.neunet.2020.01.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 11/04/2019] [Accepted: 01/01/2020] [Indexed: 11/29/2022]
Abstract
In this paper, we present an effective deep prediction framework based on robust recurrent neural networks (RNNs) to predict the likely therapeutic classes of medications a patient is taking, given a sequence of diagnostic billing codes in their record. Accurately capturing the list of medications currently taken by a given patient is extremely challenging due to undefined errors and omissions. We present a general robust framework that explicitly models the possible contamination through overtime decay mechanism on the input billing codes and noise injection into the recurrent hidden states, respectively. By doing this, billing codes are reformulated into its temporal patterns with decay rates on each medical variable, and the hidden states of RNNs are regularized by random noises which serve as dropout to improved RNNs robustness towards data variability in terms of missing values and multiple errors. The proposed method is extensively evaluated on real health care data to demonstrate its effectiveness in suggesting medication orders from contaminated values.
Collapse
Affiliation(s)
- Deyin Liu
- School of Information Engineering, Zhengzhou University, China.
| | - Yuanbo Lin Wu
- Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, China; School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230000, China.
| | - Xue Li
- Dalian Neusoft University of Information, China.
| | - Lin Qi
- School of Information Engineering, Zhengzhou University, China.
| |
Collapse
|
17
|
Bydon M, Schirmer CM, Oermann EK, Kitagawa RS, Pouratian N, Davies J, Sharan A, Chambless LB. Big Data Defined: A Practical Review for Neurosurgeons. World Neurosurg 2019; 133:e842-e849. [PMID: 31562965 DOI: 10.1016/j.wneu.2019.09.092] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 01/03/2023]
Abstract
BACKGROUND Modern science and healthcare generate vast amounts of data, and, coupled with the increasingly inexpensive and accessible computing, a tremendous opportunity exists to use these data to improve care. A better understanding of data science and its relationship to neurosurgical practice will be increasingly important as we transition into this modern "big data" era. METHODS A review of the literature was performed for key articles referencing big data for neurosurgical care or related topics. RESULTS In the present report, we first defined the nature and scope of data science from a technical perspective. We then discussed its relationship to the modern neurosurgical practice, highlighting key references, which might form a useful introductory reading list. CONCLUSIONS Numerous challenges exist going forward; however, organized neurosurgery has an important role in fostering and facilitating these efforts to merge data science with neurosurgical practice.
Collapse
Affiliation(s)
- Mohamad Bydon
- Department of Neurosurgery, Mayo Clinic, Rochester, Minnesota, USA
| | - Clemens M Schirmer
- Department of Neurosurgery, Geisinger Health System, Wilkes-Barre, Pennsylvania, USA
| | - Eric K Oermann
- Department of Neurosurgery, Mount Sinai Health System, New York, New York, USA
| | - Ryan S Kitagawa
- Department of Neurosurgery, University of Texas Health Science Center, Houston, Texas, USA
| | - Nader Pouratian
- Department of Neurosurgery, University of California, Los Angeles, Medical Center, Los Angeles, California, USA
| | - Jason Davies
- Department of Neurosurgery, State University of New York, Buffalo, New York, USA
| | - Ashwini Sharan
- Department of Neurosurgery, Thomas Jefferson University Hospital, Philadelphia, Pennsylvania, USA
| | - Lola B Chambless
- Department of Neurosurgery, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
| |
Collapse
|
18
|
Shahid AH, Singh M. Computational intelligence techniques for medical diagnosis and prognosis: Problems and current developments. Biocybern Biomed Eng 2019. [DOI: 10.1016/j.bbe.2019.05.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
19
|
Zheng C, Yu W, Xie F, Chen W, Mercado C, Sy LS, Qian L, Glenn S, Lee G, Tseng HF, Duffy J, Jackson LA, Daley MF, Crane B, McLean HQ, Jacobsen SJ. The use of natural language processing to identify Tdap-related local reactions at five health care systems in the Vaccine Safety Datalink. Int J Med Inform 2019; 127:27-34. [PMID: 31128829 PMCID: PMC6645678 DOI: 10.1016/j.ijmedinf.2019.04.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 01/31/2019] [Accepted: 04/12/2019] [Indexed: 01/28/2023]
Abstract
OBJECTIVE Local reactions are the most common vaccine-related adverse event. There is no specific diagnosis code for local reaction due to vaccination. Previous vaccine safety studies used non-specific diagnosis codes to identify potential local reaction cases and confirmed the cases through manual chart review. In this study, a natural language processing (NLP) algorithm was developed to identify local reaction associated with tetanus-diphtheria-acellular pertussis (Tdap) vaccine in the Vaccine Safety Datalink. METHODS Presumptive cases of local reactions were identified among members ≥ 11 years of age using ICD-9-CM codes in all care settings in the 1-6 days following a Tdap vaccination between 2012 and 2014. The clinical notes were searched for signs and symptoms consistent with local reaction. Information on the timing and the location of a sign or symptom was also extracted to help determine whether or not the sign or symptom was vaccine related. Reactions triggered by causes other than Tdap vaccination were excluded. The NLP algorithm was developed at the lead study site and validated on a stratified random sample of 500 patients from five institutions. RESULTS The NLP algorithm achieved an overall weighted sensitivity of 87.9%, specificity of 92.8%, positive predictive value of 82.7%, and negative predictive value of 95.1%. In addition, using data at one site, the NLP algorithm identified 3326 potential Tdap-related local reactions that were not identified through diagnosis codes. CONCLUSION The NLP algorithm achieved high accuracy, and demonstrated the potential of NLP to reduce the efforts of manual chart review in vaccine safety studies.
Collapse
Affiliation(s)
- Chengyi Zheng
- Kaiser Permanente Southern California, Pasadena, CA, USA.
| | - Wei Yu
- Kaiser Permanente Southern California, Pasadena, CA, USA
| | - Fagen Xie
- Kaiser Permanente Southern California, Pasadena, CA, USA
| | - Wansu Chen
- Kaiser Permanente Southern California, Pasadena, CA, USA
| | - Cheryl Mercado
- Kaiser Permanente Southern California, Pasadena, CA, USA
| | - Lina S Sy
- Kaiser Permanente Southern California, Pasadena, CA, USA
| | - Lei Qian
- Kaiser Permanente Southern California, Pasadena, CA, USA
| | | | - Gina Lee
- Kaiser Permanente Southern California, Pasadena, CA, USA
| | - Hung Fu Tseng
- Kaiser Permanente Southern California, Pasadena, CA, USA
| | - Jonathan Duffy
- Centers for Disease Control and Prevention, Atlanta, GA, USA
| | | | | | - Brad Crane
- Kaiser Permanente Northwest, Portland, OR, USA
| | - Huong Q McLean
- Marshfield Clinic Research Institute, Marshfield, WI, USA
| | | |
Collapse
|
20
|
Moharasan G, Ho TB. Extraction of Temporal Information from Clinical Narratives. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2019; 3:220-244. [DOI: 10.1007/s41666-019-00049-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Revised: 01/30/2019] [Accepted: 02/11/2019] [Indexed: 12/01/2022]
|
21
|
Puente C, Sobrino A, Olivas JA, Villa-Monte A. Designing a system to extract and interpret timed causal sentences in medical reports. J EXP THEOR ARTIF IN 2018. [DOI: 10.1080/0952813x.2018.1513081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- C. Puente
- Advanced Technical Faculty of Engineering ICAI, Comillas Pontifical University, Madrid, Spain
| | - A. Sobrino
- Faculty of Philosophy, University of Santiago de Compostela, La Coruña, Spain
| | - J. A. Olivas
- Department of Information Technologies and Systems, University of Castilla-La Mancha, Ciudad Real, Spain
| | - A. Villa-Monte
- Institute of Research in Computer Science LIDI, Faculty of Computer Science, National University of La Plata, La Plata, Argentina
| |
Collapse
|
22
|
Che Z, Purushotham S, Cho K, Sontag D, Liu Y. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci Rep 2018; 8:6085. [PMID: 29666385 PMCID: PMC5904216 DOI: 10.1038/s41598-018-24271-9] [Citation(s) in RCA: 348] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 03/26/2018] [Indexed: 11/08/2022] Open
Abstract
Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.
Collapse
Affiliation(s)
- Zhengping Che
- University of Southern California, Department of Computer Science, Los Angeles, CA, 90089, USA.
| | - Sanjay Purushotham
- University of Southern California, Department of Computer Science, Los Angeles, CA, 90089, USA
| | - Kyunghyun Cho
- New York University, Department of Computer Science, New York, NY, 10012, USA
| | - David Sontag
- Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Cambridge, MA, 02139, USA
| | - Yan Liu
- University of Southern California, Department of Computer Science, Los Angeles, CA, 90089, USA
| |
Collapse
|
23
|
Hripcsak G, Albers DJ. High-fidelity phenotyping: richness and freedom from bias. J Am Med Inform Assoc 2018; 25:289-294. [PMID: 29040596 PMCID: PMC7282504 DOI: 10.1093/jamia/ocx110] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 08/07/2017] [Accepted: 09/06/2017] [Indexed: 01/14/2023] Open
Abstract
Electronic health record phenotyping is the use of raw electronic health record data to assert characterizations about patients. Researchers have been doing it since the beginning of biomedical informatics, under different names. Phenotyping will benefit from an increasing focus on fidelity, both in the sense of increasing richness, such as measured levels, degree or severity, timing, probability, or conceptual relationships, and in the sense of reducing bias. Research agendas should shift from merely improving binary assignment to studying and improving richer representations. The field is actively researching new temporal directions and abstract representations, including deep learning. The field would benefit from research in nonlinear dynamics, in combining mechanistic models with empirical data, including data assimilation, and in topology. The health care process produces substantial bias, and studying that bias explicitly rather than treating it as merely another source of noise would facilitate addressing it.
Collapse
Affiliation(s)
- George Hripcsak
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| | - David J Albers
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
24
|
Shibuya A, Misawa J, Maeda Y, Ichikawa R, Kamata M, Inoue R, Morimoto T, Nakayama M, Hishiki T, Kondo Y. Psychometric validation of a new measurement instrument for time-oriented patient information in electronic medical records: A questionnaire survey of physicians. J Eval Clin Pract 2017; 23:1459-1465. [PMID: 28990315 DOI: 10.1111/jep.12824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Revised: 08/08/2017] [Accepted: 08/09/2017] [Indexed: 11/30/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Time is an important element in medical data. Physicians record and store information about patients' disease progress and treatment response in electronic medical records (EMRs). Because EMRs use timestamps, physicians can identify patterns over time regarding a patient's disease and treatment (eg, laboratory values and medications). However, analyses of physicians' use and satisfaction with EMRs have focused on functionality, storage, and system operation rather than the use of time-oriented information. This study aimed to understand physicians' needs regarding time-oriented patient information in EMRs in clinical practice. METHODS The reliability and validity of the items in the questionnaire were evaluated in 87 physicians at a national university hospital. Internal consistency was satisfactory (Cronbach alpha coefficient, 0.87). RESULTS Four dimensions were identified in exploratory factor analysis. Correlations between the 4 dimensions supported the construct validity of the items. Scores of time-oriented patients' medical history in the 4 dimensions showed a significant association with physician age. Based on confirmatory factor analysis, associations were significant and positive (P < .001). In terms of the needs of physicians regarding time-oriented patient information in EMRs, both time-oriented treatment results followed by time-oriented team information had significant positive associations. CONCLUSION Our study suggests that 4 specific time-oriented patient information factors in EMRs are needed by physicians. Exploring physicians' needs regarding patient-specific time-oriented information may provide a better understanding of the barriers facing the adoption and use of EMRs (eg, decision-making and practice safety concerns) and lead to better acceptance of EMRs in physicians' clinical practices.
Collapse
Affiliation(s)
- Akiko Shibuya
- Department of Health Care Services Management, Nihon University School of Medicine, Tokyo, Japan
| | - Jimpei Misawa
- Department of Health Care Services Management, Nihon University School of Medicine, Tokyo, Japan
| | - Yukihiro Maeda
- Department of Health Care Services Management, Nihon University School of Medicine, Tokyo, Japan
| | - Rie Ichikawa
- Department of Health Care Services Management, Nihon University School of Medicine, Tokyo, Japan.,Department of Pediatrics and Child Health, Nihon University School of Medicine, Tokyo, Japan
| | - Michiyo Kamata
- Department of Nursing, Tohoku Fukushi University, Sendai, Japan
| | - Ryusuke Inoue
- Medical Informatics Center, Tohoku University Hospital, Sendai, Japan
| | - Tetsuji Morimoto
- Division of Pediatrics, Tohoku Medical and Pharmaceutical University, Sendai, Japan
| | - Masaharu Nakayama
- Medical Informatics Center, Tohoku University Hospital, Sendai, Japan.,Department of Medical Informatics, Tohoku University Graduate School of Medicine, Sendai, Japan
| | | | - Yoshiaki Kondo
- Department of Health Care Services Management, Nihon University School of Medicine, Tokyo, Japan
| |
Collapse
|
25
|
Madkour M, Benhaddou D, Tao C. Temporal data representation, normalization, extraction, and reasoning: A review from clinical domain. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 128:52-68. [PMID: 27040831 PMCID: PMC4837648 DOI: 10.1016/j.cmpb.2016.02.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 02/16/2016] [Indexed: 05/04/2023]
Abstract
BACKGROUND AND OBJECTIVE We live our lives by the calendar and the clock, but time is also an abstraction, even an illusion. The sense of time can be both domain-specific and complex, and is often left implicit, requiring significant domain knowledge to accurately recognize and harness. In the clinical domain, the momentum gained from recent advances in infrastructure and governance practices has enabled the collection of tremendous amount of data at each moment in time. Electronic health records (EHRs) have paved the way to making these data available for practitioners and researchers. However, temporal data representation, normalization, extraction and reasoning are very important in order to mine such massive data and therefore for constructing the clinical timeline. The objective of this work is to provide an overview of the problem of constructing a timeline at the clinical point of care and to summarize the state-of-the-art in processing temporal information of clinical narratives. METHODS This review surveys the methods used in three important area: modeling and representing of time, medical NLP methods for extracting time, and methods of time reasoning and processing. The review emphasis on the current existing gap between present methods and the semantic web technologies and catch up with the possible combinations. RESULTS The main findings of this review are revealing the importance of time processing not only in constructing timelines and clinical decision support systems but also as a vital component of EHR data models and operations. CONCLUSIONS Extracting temporal information in clinical narratives is a challenging task. The inclusion of ontologies and semantic web will lead to better assessment of the annotation task and, together with medical NLP techniques, will help resolving granularity and co-reference resolution problems.
Collapse
Affiliation(s)
- Mohcine Madkour
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St, Houston, TX 77030, United States.
| | - Driss Benhaddou
- Department of Engineering Technology, University of Houston, 4800 Calhoun Rd, Houston, TX 77004, United States.
| | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St, Houston, TX 77030, United States.
| |
Collapse
|
26
|
Warner JL, Zhang P, Liu J, Alterovitz G. Classification of hospital acquired complications using temporal clinical information from a large electronic health record. J Biomed Inform 2015; 59:209-17. [PMID: 26707449 DOI: 10.1016/j.jbi.2015.12.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Revised: 11/18/2015] [Accepted: 12/12/2015] [Indexed: 10/25/2022]
Abstract
Hospital acquired complications (HACs) are serious problems affecting modern day healthcare institutions. It is estimated that HACs result in an approximately 10% increase in total inpatient hospital costs across US hospitals. With US hospital spending totaling nearly $900 billion per annum, the damages caused by HACs are no small matter. Early detection and prevention of HACs could greatly reduce strains on the US healthcare system and improve patient morbidity & mortality rates. Here, we describe a machine-learning model for predicting the occurrence of HACs within five distinct categories using temporal clinical data. Using our approach, we find that at least $10 billion of excessive hospital costs could be saved in the US alone, with the institution of effective preventive measures. In addition, we also identify several keystone features that demonstrate high predictive power for HACs over different time periods following patient admission. The classifiers and features analyzed in this study show high promise of being able to be used for accurate prediction of HACs in clinical settings, and furthermore provide novel insights into the contribution of various clinical factors to the risk of developing HACs as a function of healthcare system exposure.
Collapse
Affiliation(s)
- Jeremy L Warner
- Department of Medicine, Division of Hematology & Oncology, Vanderbilt University, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
| | - Peijin Zhang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jenny Liu
- Department of Electrical Engineering & Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gil Alterovitz
- Department of Electrical Engineering & Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA; Center for Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Children's Hospital Informatics Program at Harvard-MIT Health Sciences & Technology, Boston, MA, USA
| |
Collapse
|
27
|
Lin C, Dligach D, Miller TA, Bethard S, Savova GK. Multilayered temporal modeling for the clinical domain. J Am Med Inform Assoc 2015; 23:387-95. [PMID: 26521301 DOI: 10.1093/jamia/ocv113] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 06/26/2015] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE To develop an open-source temporal relation discovery system for the clinical domain. The system is capable of automatically inferring temporal relations between events and time expressions using a multilayered modeling strategy. It can operate at different levels of granularity--from rough temporality expressed as event relations to the document creation time (DCT) to temporal containment to fine-grained classic Allen-style relations. MATERIALS AND METHODS We evaluated our systems on 2 clinical corpora. One is a subset of the Temporal Histories of Your Medical Events (THYME) corpus, which was used in SemEval 2015 Task 6: Clinical TempEval. The other is the 2012 Informatics for Integrating Biology and the Bedside (i2b2) challenge corpus. We designed multiple supervised machine learning models to compute the DCT relation and within-sentence temporal relations. For the i2b2 data, we also developed models and rule-based methods to recognize cross-sentence temporal relations. We used the official evaluation scripts of both challenges to make our results comparable with results of other participating systems. In addition, we conducted a feature ablation study to find out the contribution of various features to the system's performance. RESULTS Our system achieved state-of-the-art performance on the Clinical TempEval corpus and was on par with the best systems on the i2b2 2012 corpus. Particularly, on the Clinical TempEval corpus, our system established a new F1 score benchmark, statistically significant as compared to the baseline and the best participating system. CONCLUSION Presented here is the first open-source clinical temporal relation discovery system. It was built using a multilayered temporal modeling strategy and achieved top performance in 2 major shared tasks.
Collapse
Affiliation(s)
- Chen Lin
- Boston Children's Hospital Boston, Boston, Massachusetts, USA
| | - Dmitriy Dligach
- Boston Children's Hospital Boston, Boston, Massachusetts, USA Harvard Medical School, Harvard University, Boston, Massachusetts, USA
| | - Timothy A Miller
- Boston Children's Hospital Boston, Boston, Massachusetts, USA Harvard Medical School, Harvard University, Boston, Massachusetts, USA
| | - Steven Bethard
- Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Guergana K Savova
- Boston Children's Hospital Boston, Boston, Massachusetts, USA Harvard Medical School, Harvard University, Boston, Massachusetts, USA
| |
Collapse
|
28
|
Hripcsak G, Albers DJ, Perotte A. Parameterizing time in electronic health record studies. J Am Med Inform Assoc 2015; 22:794-804. [PMID: 25725004 PMCID: PMC6169471 DOI: 10.1093/jamia/ocu051] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Revised: 11/08/2014] [Accepted: 12/22/2014] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Fields like nonlinear physics offer methods for analyzing time series, but many methods require that the time series be stationary-no change in properties over time.Objective Medicine is far from stationary, but the challenge may be able to be ameliorated by reparameterizing time because clinicians tend to measure patients more frequently when they are ill and are more likely to vary. METHODS We compared time parameterizations, measuring variability of rate of change and magnitude of change, and looking for homogeneity of bins of temporal separation between pairs of time points. We studied four common laboratory tests drawn from 25 years of electronic health records on 4 million patients. RESULTS We found that sequence time-that is, simply counting the number of measurements from some start-produced more stationary time series, better explained the variation in values, and had more homogeneous bins than either traditional clock time or a recently proposed intermediate parameterization. Sequence time produced more accurate predictions in a single Gaussian process model experiment. CONCLUSIONS Of the three parameterizations, sequence time appeared to produce the most stationary series, possibly because clinicians adjust their sampling to the acuity of the patient. Parameterizing by sequence time may be applicable to association and clustering experiments on electronic health record data. A limitation of this study is that laboratory data were derived from only one institution. Sequence time appears to be an important potential parameterization.
Collapse
Affiliation(s)
- George Hripcsak
- Department of Biomedical Informatics, Columbia University Medical Center, New York, USA Medical Informatics Services, NewYork-Presbyterian Hospital, New York, USA
| | - David J Albers
- Department of Biomedical Informatics, Columbia University Medical Center, New York, USA
| | - Adler Perotte
- Department of Biomedical Informatics, Columbia University Medical Center, New York, USA
| |
Collapse
|
29
|
Pivovarov R, Elhadad N. Automated methods for the summarization of electronic health records. J Am Med Inform Assoc 2015; 22:938-47. [PMID: 25882031 PMCID: PMC4986665 DOI: 10.1093/jamia/ocv032] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Accepted: 03/15/2015] [Indexed: 02/02/2023] Open
Abstract
Objectives This review examines work on automated summarization of electronic health record (EHR) data and in particular, individual patient record summarization. We organize the published research and highlight methodological challenges in the area of EHR summarization implementation. Target audience The target audience for this review includes researchers, designers, and informaticians who are concerned about the problem of information overload in the clinical setting as well as both users and developers of clinical summarization systems. Scope Automated summarization has been a long-studied subject in the fields of natural language processing and human–computer interaction, but the translation of summarization and visualization methods to the complexity of the clinical workflow is slow moving. We assess work in aggregating and visualizing patient information with a particular focus on methods for detecting and removing redundancy, describing temporality, determining salience, accounting for missing data, and taking advantage of encoded clinical knowledge. We identify and discuss open challenges critical to the implementation and use of robust EHR summarization systems.
Collapse
Affiliation(s)
- Rimma Pivovarov
- Department of Biomedical Informatics, Columbia University, New York, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, USA
| |
Collapse
|
30
|
|
31
|
Clark K, Sharma D, Qin R, Chute CG, Tao C. A use case study on late stent thrombosis for ontology-based temporal reasoning and analysis. J Biomed Semantics 2014; 5:49. [PMID: 25540680 PMCID: PMC4275934 DOI: 10.1186/2041-1480-5-49] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 11/13/2014] [Indexed: 11/30/2022] Open
Abstract
In this paper, we show how we have applied the Clinical Narrative Temporal Relation Ontology (CNTRO) and its associated temporal reasoning system (the CNTRO Timeline Library) to trend temporal information within medical device adverse event report narratives. 238 narratives documenting occurrences of late stent thrombosis adverse events from the Food and Drug Administration’s (FDA) Manufacturing and User Facility Device Experience (MAUDE) database were annotated and evaluated using the CNTRO Timeline Library to identify, order, and calculate the duration of temporal events. The CNTRO Timeline Library had a 95% accuracy in correctly ordering events within the 238 narratives. 41 narratives included an event in which the duration was documented, and the CNTRO Timeline Library had an 80% accuracy in correctly determining these durations. 77 narratives included documentation of a duration between events, and the CNTRO Timeline Library had a 76% accuracy in determining these durations. This paper also includes an example of how this temporal output from the CNTRO ontology can be used to verify recommendations for length of drug administration, and proposes that these same tools could be applied to other medical device adverse event narratives in order to identify currently unknown temporal trends.
Collapse
Affiliation(s)
- Kim Clark
- Boston Scientific Corporation, Maple Grove, MN USA
| | - Deepak Sharma
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN USA
| | - Rui Qin
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN USA
| | - Christopher G Chute
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN USA
| | - Cui Tao
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN USA ; School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX USA
| |
Collapse
|
32
|
Raghavan P, Chen JL, Fosler-Lussier E, Lai AM. How essential are unstructured clinical narratives and information fusion to clinical trial recruitment? AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:218-23. [PMID: 25717416 PMCID: PMC4333685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Electronic health records capture patient information using structured controlled vocabularies and unstructured narrative text. While structured data typically encodes lab values, encounters and medication lists, unstructured data captures the physician's interpretation of the patient's condition, prognosis, and response to therapeutic intervention. In this paper, we demonstrate that information extraction from unstructured clinical narratives is essential to most clinical applications. We perform an empirical study to validate the argument and show that structured data alone is insufficient in resolving eligibility criteria for recruiting patients onto clinical trials for chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is essential to solving 59% of the CLL trial criteria and 77% of the prostate cancer trial criteria. More specifically, for resolving eligibility criteria with temporal constraints, we show the need for temporal reasoning and information integration with medical events within and across unstructured clinical narratives and structured data.
Collapse
|
33
|
Lin YK, Chen H, Brown RA. MedTime: A temporal information extraction system for clinical narratives. J Biomed Inform 2013; 46 Suppl:S20-S28. [DOI: 10.1016/j.jbi.2013.07.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Revised: 07/12/2013] [Accepted: 07/22/2013] [Indexed: 10/26/2022]
|
34
|
Nikfarjam A, Emadzadeh E, Gonzalez G. Towards generating a patient's timeline: extracting temporal relationships from clinical notes. J Biomed Inform 2013; 46 Suppl:S40-S47. [PMID: 24212118 DOI: 10.1016/j.jbi.2013.11.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2013] [Revised: 10/31/2013] [Accepted: 11/01/2013] [Indexed: 10/26/2022]
Abstract
Clinical records include both coded and free-text fields that interact to reflect complicated patient stories. The information often covers not only the present medical condition and events experienced by the patient, but also refers to relevant events in the past (such as signs, symptoms, tests or treatments). In order to automatically construct a timeline of these events, we first need to extract the temporal relations between pairs of events or time expressions presented in the clinical notes. We designed separate extraction components for different types of temporal relations, utilizing a novel hybrid system that combines machine learning with a graph-based inference mechanism to extract the temporal links. The temporal graph is a directed graph based on parse tree dependencies of the simplified sentences and frequent pattern clues. We generalized the sentences in order to discover patterns that, given the complexities of natural language, might not be directly discoverable in the original sentences. The proposed hybrid system performance reached an F-measure of 0.63, with precision at 0.76 and recall at 0.54 on the 2012 i2b2 Natural Language Processing corpus for the temporal relation (TLink) extraction task, achieving the highest precision and third highest f-measure among participating teams in the TLink track.
Collapse
Affiliation(s)
- Azadeh Nikfarjam
- Department of Biomedical Informatics, Arizona State University, Tempe, USA.
| | - Ehsan Emadzadeh
- Department of Biomedical Informatics, Arizona State University, Tempe, USA
| | - Graciela Gonzalez
- Department of Biomedical Informatics, Arizona State University, Tempe, USA
| |
Collapse
|
35
|
Safari L, Patrick JD. A temporal model for Clinical Data Analytics language. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2013:3218-21. [PMID: 24110413 DOI: 10.1109/embc.2013.6610226] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The proposal of a special purpose language for Clinical Data Analytics (CliniDAL) is presented along with a general model for expressing temporal events in the language. The temporal dimension of clinical data needs to be addressed from at least five different points of view. Firstly, how to attach the knowledge of time based constraints to queries; secondly, how to mine temporal data in different CISs with various data models; thirdly, how to deal with both relative time and absolute time in the query language; fourthly, how to tackle internal time-event dependencies in queries, and finally, how to manage historical time events preserved in the patient's narrative. The temporal elements of the language are defined in Bachus Naur Form (BNF) along with a UML schema. Its use in a designed taxonomy of a five class hierarchy of data analytics tasks shows the solution to problems of time event dependencies in a highly complex cascade of queries needed to evaluate scientific experiments. The issues in using the model in a practical way are discussed as well.
Collapse
|
36
|
Sun W, Rumshisky A, Uzuner O. Temporal reasoning over clinical text: the state of the art. J Am Med Inform Assoc 2013; 20:814-9. [PMID: 23676245 PMCID: PMC3756277 DOI: 10.1136/amiajnl-2013-001760] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Revised: 04/17/2013] [Accepted: 04/20/2013] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVES To provide an overview of the problem of temporal reasoning over clinical text and to summarize the state of the art in clinical natural language processing for this task. TARGET AUDIENCE This overview targets medical informatics researchers who are unfamiliar with the problems and applications of temporal reasoning over clinical text. SCOPE We review the major applications of text-based temporal reasoning, describe the challenges for software systems handling temporal information in clinical text, and give an overview of the state of the art. Finally, we present some perspectives on future research directions that emerged during the recent community-wide challenge on text-based temporal reasoning in the clinical domain.
Collapse
Affiliation(s)
- Weiyi Sun
- Department of Informatics, University at Albany, SUNY, Albany, New York, USA
| | - Anna Rumshisky
- Department of Computer Science, University of Massachusetts, Lowell, Massachusetts, USA
| | - Ozlem Uzuner
- Department of Information Studies, University at Albany, SUNY, Albany, New York, USA
| |
Collapse
|
37
|
Knowledge management and informatics considerations for comparative effectiveness research: a case-driven exploration. Med Care 2013; 51:S38-44. [PMID: 23793050 DOI: 10.1097/mlr.0b013e31829b1de1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND As clinical data are increasingly collected and stored electronically, their potential use for comparative effectiveness research (CER) grows. Despite this promise, challenges face those wishing to leverage such data. In this paper we aim to enumerate some of the knowledge management and informatics issues common to such data reuse. DESIGN After reviewing the current state of knowledge regarding biomedical informatics challenges and best practices related to CER, we then present 2 research projects at our institution. We analyze these and highlight several common themes and challenges related to the conduct of CER studies. Finally, we represent these emergent themes. RESULTS The informatics challenges commonly encountered by those conducting CER studies include issues related to data information and knowledge management (eg, data reuse, data preparation) as well as those related to people and organizational issues (eg, sociotechnical factors and organizational factors). Examples of these are described in further detail and a formal framework for describing these findings is presented. CONCLUSIONS Significant challenges face researchers attempting to use often diverse and heterogeneous datasets for CER. These challenges must be understood in order to be dealt with successfully and can often be overcome with the appropriate use of informatics best practices. Many research and policy questions remain to be answered in order to realize the full potential of the increasingly electronic clinical data available for such research.
Collapse
|
38
|
Riaño D, Bohada JA, Collado A, López-Vallverdú JA. MPM: A knowledge-based functional model of medical practice. J Biomed Inform 2013; 46:379-87. [PMID: 23420015 DOI: 10.1016/j.jbi.2013.01.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Revised: 01/29/2013] [Accepted: 01/30/2013] [Indexed: 11/26/2022]
Affiliation(s)
- David Riaño
- Research Group on Artificial Intelligence, Universitat Rovira i Virgili, Tarragona, Spain.
| | | | | | | |
Collapse
|
39
|
Tang B, Wu Y, Jiang M, Chen Y, Denny JC, Xu H. A hybrid system for temporal information extraction from clinical text. J Am Med Inform Assoc 2013; 20:828-35. [PMID: 23571849 DOI: 10.1136/amiajnl-2013-001635] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE To develop a comprehensive temporal information extraction system that can identify events, temporal expressions, and their temporal relations in clinical text. This project was part of the 2012 i2b2 clinical natural language processing (NLP) challenge on temporal information extraction. MATERIALS AND METHODS The 2012 i2b2 NLP challenge organizers manually annotated 310 clinic notes according to a defined annotation guideline: a training set of 190 notes and a test set of 120 notes. All participating systems were developed on the training set and evaluated on the test set. Our system consists of three modules: event extraction, temporal expression extraction, and temporal relation (also called Temporal Link, or 'TLink') extraction. The TLink extraction module contains three individual classifiers for TLinks: (1) between events and section times, (2) within a sentence, and (3) across different sentences. The performance of our system was evaluated using scripts provided by the i2b2 organizers. Primary measures were micro-averaged Precision, Recall, and F-measure. RESULTS Our system was among the top ranked. It achieved F-measures of 0.8659 for temporal expression extraction (ranked fourth), 0.6278 for end-to-end TLink track (ranked first), and 0.6932 for TLink-only track (ranked first) in the challenge. We subsequently investigated different strategies for TLink extraction, and were able to marginally improve performance with an F-measure of 0.6943 for TLink-only track.
Collapse
Affiliation(s)
- Buzhou Tang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | | | | | | | | | | |
Collapse
|
40
|
Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc 2013; 20:806-13. [PMID: 23564629 DOI: 10.1136/amiajnl-2013-001628] [Citation(s) in RCA: 184] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND The Sixth Informatics for Integrating Biology and the Bedside (i2b2) Natural Language Processing Challenge for Clinical Records focused on the temporal relations in clinical narratives. The organizers provided the research community with a corpus of discharge summaries annotated with temporal information, to be used for the development and evaluation of temporal reasoning systems. 18 teams from around the world participated in the challenge. During the workshop, participating teams presented comprehensive reviews and analysis of their systems, and outlined future research directions suggested by the challenge contributions. METHODS The challenge evaluated systems on the information extraction tasks that targeted: (1) clinically significant events, including both clinical concepts such as problems, tests, treatments, and clinical departments, and events relevant to the patient's clinical timeline, such as admissions, transfers between departments, etc; (2) temporal expressions, referring to the dates, times, durations, or frequencies phrases in the clinical text. The values of the extracted temporal expressions had to be normalized to an ISO specification standard; and (3) temporal relations, between the clinical events and temporal expressions. Participants determined pairs of events and temporal expressions that exhibited a temporal relation, and identified the temporal relation between them. RESULTS For event detection, statistical machine learning (ML) methods consistently showed superior performance. While ML and rule based methods seemed to detect temporal expressions equally well, the best systems overwhelmingly adopted a rule based approach for value normalization. For temporal relation classification, the systems using hybrid approaches that combined ML and heuristics based methods produced the best results.
Collapse
Affiliation(s)
- Weiyi Sun
- Department of Informatics, University at Albany, SUNY, Albany, New York, USA
| | | | | |
Collapse
|
41
|
Cherry C, Zhu X, Martin J, de Bruijn B. A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge. J Am Med Inform Assoc 2013; 20:843-8. [PMID: 23523875 PMCID: PMC3756270 DOI: 10.1136/amiajnl-2013-001624] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Objective An analysis of the timing of events is critical for a deeper understanding of the course of events within a patient record. The 2012 i2b2 NLP challenge focused on the extraction of temporal relationships between concepts within textual hospital discharge summaries. Materials and methods The team from the National Research Council Canada (NRC) submitted three system runs to the second track of the challenge: typifying the time-relationship between pre-annotated entities. The NRC system was designed around four specialist modules containing statistical machine learning classifiers. Each specialist targeted distinct sets of relationships: local relationships, ‘sectime’-type relationships, non-local overlap-type relationships, and non-local causal relationships. Results The best NRC submission achieved a precision of 0.7499, a recall of 0.6431, and an F1 score of 0.6924, resulting in a statistical tie for first place. Post hoc improvements led to a precision of 0.7537, a recall of 0.6455, and an F1 score of 0.6954, giving the highest scores reported on this task to date. Discussion and conclusions Methods for general relation extraction extended well to temporal relations, and gave top-ranked state-of-the-art results. Careful ordering of predictions within result sets proved critical to this success.
Collapse
Affiliation(s)
- Colin Cherry
- Information and Communication Technologies, National Research Council Canada, Ottawa, Ontario, Canada
| | | | | | | |
Collapse
|
42
|
Raghavan P, Fosler-Lussier E, Lai AM. Inter-annotator reliability of medical events, coreferences and temporal relations in clinical narratives by annotators with varying levels of clinical expertise. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:1366-1374. [PMID: 23304416 PMCID: PMC3540452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The manual annotation of clinical narratives is an important step for training and validating the performance of automated systems that utilize these clinical narratives. We build an annotation specification to capture medical events, and coreferences and temporal relations between medical events in clinical text. Unfortunately, the process of clinical data annotation is both time consuming and costly. Many annotation efforts have used physicians to annotate the data. We investigate using annotators that are current students or graduates from diverse clinical backgrounds with varying levels of clinical experience. In spite of this diversity, the annotation agreement across our team of annotators is high; the average inter-annotator kappa statistic for medical events, coreferences, temporal relations, and medical event concept unique identifiers was 0.843, 0.859, 0.833, and 0.806, respectively. We describe methods towards leveraging the annotations to support temporal reasoning with medical events.
Collapse
|
43
|
Abstract
The national adoption of electronic health records (EHR) promises to make an unprecedented amount of data available for clinical research, but the data are complex, inaccurate, and frequently missing, and the record reflects complex processes aside from the patient's physiological state. We believe that the path forward requires studying the EHR as an object of interest in itself, and that new models, learning from data, and collaboration will lead to efficient use of the valuable information currently locked in health records.
Collapse
Affiliation(s)
- George Hripcsak
- Biomedical Informatics, Columbia University, New York, NY 10027,
| | | |
Collapse
|
44
|
Hahn U, Cohen KB, Garten Y, Shah NH. Mining the pharmacogenomics literature--a survey of the state of the art. Brief Bioinform 2012; 13:460-94. [PMID: 22833496 PMCID: PMC3404399 DOI: 10.1093/bib/bbs018] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 03/23/2012] [Indexed: 01/05/2023] Open
Abstract
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.
Collapse
Affiliation(s)
- Udo Hahn
- Jena University Language and Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany.
| | | | | | | |
Collapse
|
45
|
Reeves RM, Ong FR, Matheny ME, Denny JC, Aronsky D, Gobbel GT, Montella D, Speroff T, Brown SH. Detecting temporal expressions in medical narratives. Int J Med Inform 2012; 82:118-27. [PMID: 22595284 DOI: 10.1016/j.ijmedinf.2012.04.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Revised: 03/30/2012] [Accepted: 04/12/2012] [Indexed: 12/27/2022]
Abstract
BACKGROUND Clinical practice and epidemiological information aggregation require knowing when, how long, and in what sequence medically relevant events occur. The Temporal Awareness and Reasoning Systems for Question Interpretation (TARSQI) Toolkit (TTK) is a complete, open source software package for the temporal ordering of events within narrative text documents. TTK was developed on newspaper articles. We extended TTK to support medical notes using veterans' affairs (VA) clinical notes and compared it to TTK. METHODS We used a development set consisting of 200 VA clinical notes to modify and append rules to TTK's time tagger, creating Med-TTK. We then evaluated the performances of TTK and Med-TTK on an independent random selection of 100 clinical notes. Evaluation tasks were to identify and classify time-referring expressions as one of four temporal classes (DATE, TIME, DURATION, and SET). The reference standard for this test set was generated by dual human manual review with disagreements resolved by a third reviewer. Outcome measures included recall and precision for each class, and inter-rater agreement scores. RESULTS There were 3146 temporal expressions in the reference standard. TTK identified 1595 temporal expressions. Recall was 0.15 (95% confidence interval [CI] 0.12-0.15) and precision was 0.27 (95% CI 0.25-0.29) for TTK. Med-TTK identified 3174 expressions. Recall was 0.86 (95% CI 0.84-0.87) and precision was 0.85 (95% CI 0.84-0.86) for Med-TTK. CONCLUSION The algorithms for identifying and classifying temporal expressions in medical narratives developed within Med-TTK significantly improved performance compared to TTK. Natural language processing applications such as Med-TTK provide a foundation for meaningful longitudinal mapping of patient history events among electronic health records. The tool can be accessed at the following site: http://code.google.com/p/med-ttk/.
Collapse
Affiliation(s)
- Ruth M Reeves
- Geriatric Research Education and Clinical Center, Tennessee Valley Healthcare System, Department of Veterans Affairs, Nashville, TN, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Boland MR, Tu SW, Carini S, Sim I, Weng C. EliXR-TIME: A Temporal Knowledge Representation for Clinical Research Eligibility Criteria. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2012; 2012:71-80. [PMID: 22779055 PMCID: PMC3392056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Effective clinical text processing requires accurate extraction and representation of temporal expressions. Multiple temporal information extraction models were developed but a similar need for extracting temporal expressions in eligibility criteria (e.g., for eligibility determination) remains. We identified the temporal knowledge representation requirements of eligibility criteria by reviewing 100 temporal criteria. We developed EliXR-TIME, a frame-based representation designed to support semantic annotation for temporal expressions in eligibility criteria by reusing applicable classes from well-known clinical temporal knowledge representations. We used EliXR-TIME to analyze a training set of 50 new temporal eligibility criteria. We evaluated EliXR-TIME using an additional random sample of 20 eligibility criteria with temporal expressions that have no overlap with the training data, yielding 92.7% (76 / 82) inter-coder agreement on sentence chunking and 72% (72 / 100) agreement on semantic annotation. We conclude that this knowledge representation can facilitate semantic annotation of the temporal expressions in eligibility criteria.
Collapse
|
47
|
Liu M, Jiang M, Kawai VK, Stein CM, Roden DM, Denny JC, Xu H. Modeling drug exposure data in electronic medical records: an application to warfarin. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:815-823. [PMID: 22195139 PMCID: PMC3243123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Identification of patients' drug exposure information is critical to drug-related research that is based on electronic medical records (EMRs). Drug information is often embedded in clinical narratives and drug regimens change frequently because of various reasons like intolerance or insurance issues, making accurate modeling challenging. Here, we developed an informatics framework to determine patient drug exposure histories from EMRs by combining natural language processing (NLP) and machine learning (ML) technologies. Our framework consists of three phases: 1) drug entity recognition - identifying drug mentions; 2) drug event detection - labeling drug mentions with a status (e.g., "on" or "stop"); and 3) drug exposure modeling - predicting if a patient is taking a drug at a given time using the status and temporal information associated with the mentions. We applied the framework to determine patient warfarin exposure at hospital admissions and achieved 87% precision, 79% recall, and an area under the receiver-operator characteristic curve of 0.93.
Collapse
Affiliation(s)
- Mei Liu
- Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN, USA
| | | | | | | | | | | | | |
Collapse
|
48
|
Garla V, Lo Re V, Dorey-Stein Z, Kidwai F, Scotch M, Womack J, Justice A, Brandt C. The Yale cTAKES extensions for document classification: architecture and application. J Am Med Inform Assoc 2011; 18:614-20. [PMID: 21622934 PMCID: PMC3168305 DOI: 10.1136/amiajnl-2011-000093] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Accepted: 04/22/2011] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND Open-source clinical natural-language-processing (NLP) systems have lowered the barrier to the development of effective clinical document classification systems. Clinical natural-language-processing systems annotate the syntax and semantics of clinical text; however, feature extraction and representation for document classification pose technical challenges. METHODS The authors developed extensions to the clinical Text Analysis and Knowledge Extraction System (cTAKES) that simplify feature extraction, experimentation with various feature representations, and the development of both rule and machine-learning based document classifiers. The authors describe and evaluate their system, the Yale cTAKES Extensions (YTEX), on the classification of radiology reports that contain findings suggestive of hepatic decompensation. RESULTS AND DISCUSSION The F(1)-Score of the system for the retrieval of abdominal radiology reports was 96%, and was 79%, 91%, and 95% for the presence of liver masses, ascites, and varices, respectively. The authors released YTEX as open source, available at http://code.google.com/p/ytex.
Collapse
Affiliation(s)
- Vijay Garla
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, Connecticut, USA.
| | | | | | | | | | | | | | | |
Collapse
|
49
|
Feblowitz JC, Wright A, Singh H, Samal L, Sittig DF. Summarization of clinical information: A conceptual model. J Biomed Inform 2011; 44:688-99. [PMID: 21440086 DOI: 10.1016/j.jbi.2011.03.008] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2010] [Revised: 03/18/2011] [Accepted: 03/18/2011] [Indexed: 02/08/2023]
|
50
|
LaFleur J, McAdam-Marx C, Alder SS, Sheng X, Asche CV, Nebeker J, Brixner DI, Silverman SL. Clinical risk factors for fracture among postmenopausal patients at risk for fracture: a historical cohort study using electronic medical record data. J Bone Miner Metab 2011; 29:193-200. [PMID: 20686803 DOI: 10.1007/s00774-010-0207-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/10/2008] [Accepted: 06/11/2010] [Indexed: 11/29/2022]
Abstract
Osteoporosis represents a growing health burden, but recognition and screening rates are low. Electronic reminders for osteoporosis have been beneficial but are not based on clinical risk factors. Available risk screening tools may contain useful constructs for creating risk-based electronic medical record (EMR) reminders. Using a cohort study design among women ≥50 years with osteoporosis or osteoporosis risk, we searched the EMR for five World Health Organization (WHO) clinical risk factors including older age, lower body mass index (BMI), low bone mineral density (BMD), history of fracture since age 50, and maternal history of osteoporosis or fracture. Rates of reporting were lower than expected for BMD (6.8%), personal history of fracture (3.5%), and maternal history of fracture (0.3%). Despite the limitations, the EMR data were useful for identifying women at highest risk for fracture. Some evidence of bias in reporting rates was present. EMR data can be useful for identifying high fracture risk patients.
Collapse
Affiliation(s)
- Joanne LaFleur
- Department of Pharmacotherapy, University of Utah, Salt Lake City, UT, USA.
| | | | | | | | | | | | | | | |
Collapse
|