Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Topaz M, Lai K, Dowding D, Lei VJ, Zisberg A, Bowles KH, Zhou L. Automated identification of wound information in clinical notes of patients with heart diseases: Developing and validating a natural language processing application. Int J Nurs Stud 2016;64:25-31. [DOI: 10.1016/j.ijnurstu.2016.09.013] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Revised: 09/13/2016] [Accepted: 09/18/2016] [Indexed: 11/30/2022]

For:	Topaz M, Lai K, Dowding D, Lei VJ, Zisberg A, Bowles KH, Zhou L. Automated identification of wound information in clinical notes of patients with heart diseases: Developing and validating a natural language processing application. Int J Nurs Stud 2016;64:25-31. [DOI: 10.1016/j.ijnurstu.2016.09.013] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Revised: 09/13/2016] [Accepted: 09/18/2016] [Indexed: 11/30/2022]

Number

Cited by Other Article(s)

Hobensack M, Song J, Oh S, Evans L, Davoudi A, Bowles KH, McDonald MV, Barrón Y, Sridharan S, Wallace AS, Topaz M. Social Risk Factors are Associated with Risk for Hospitalization in Home Health Care: A Natural Language Processing Study. J Am Med Dir Assoc 2023;24:1874-1880.e4. [PMID: 37553081 PMCID: PMC10839109 DOI: 10.1016/j.jamda.2023.06.031] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/23/2023] [Accepted: 06/25/2023] [Indexed: 08/10/2023]

Abstract

OBJECTIVE

This study aimed to develop a natural language processing (NLP) system that identified social risk factors in home health care (HHC) clinical notes and to examine the association between social risk factors and hospitalization or an emergency department (ED) visit.

DESIGN

Retrospective cohort study.

SETTING AND PARTICIPANTS

We used standardized assessments and clinical notes from one HHC agency located in the northeastern United States. This included 86,866 episodes of care for 65,593 unique patients. Patients received HHC services between 2015 and 2017.

METHODS

Guided by HHC experts, we created a vocabulary of social risk factors that influence hospitalization or ED visit risk in the HHC setting. We then developed an NLP system to automatically identify social risk factors documented in clinical notes. We used an adjusted logistic regression model to examine the association between the NLP-based social risk factors and hospitalization or an ED visit.

RESULTS

On the basis of expert consensus, the following social risk factors emerged: Social Environment, Physical Environment, Education and Literacy, Food Insecurity, Access to Care, and Housing and Economic Circumstances. Our NLP system performed "very good" with an F score of 0.91. Approximately 4% of clinical notes (33% episodes of care) documented a social risk factor. The most frequently documented social risk factors were Physical Environment and Social Environment. Except for Housing and Economic Circumstances, all NLP-based social risk factors were associated with higher odds of hospitalization and ED visits.

CONCLUSIONS AND IMPLICATIONS

HHC clinicians assess and document social risk factors associated with hospitalizations and ED visits in their clinical notes. Future studies can explore the social risk factors documented in HHC to improve communication across the health care system and to predict patients at risk for being hospitalized or visiting the ED.

Collapse

Mitha S, Schwartz J, Hobensack M, Cato K, Woo K, Smaldone A, Topaz M. Natural Language Processing of Nursing Notes: An Integrative Review. Comput Inform Nurs 2023;41:377-384. [PMID: 36730744 DOI: 10.1097/cin.0000000000000967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Jeon E, Kim A, Lee J, Heo H, Lee H, Woo K. Developing a Classification Algorithm for Prediabetes Risk Detection From Home Care Nursing Notes: Using Natural Language Processing. Comput Inform Nurs 2023:00024665-990000000-00087. [PMID: 37165830 DOI: 10.1097/cin.0000000000001000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]

Chae S, Song J, Ojo M, Bowles KH, McDonald MV, Barrón Y, Hobensack M, Kennedy E, Sridharan S, Evans L, Topaz M. Factors associated with poor self-management documented in home health care narrative notes for patients with heart failure. Heart Lung 2022;55:148-154. [PMID: 35597164 PMCID: PMC11021173 DOI: 10.1016/j.hrtlng.2022.05.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Revised: 05/03/2022] [Accepted: 05/07/2022] [Indexed: 11/04/2022]

Automatic Text Summarization of Biomedical Text Data: A Systematic Review. INFORMATION 2022. [DOI: 10.3390/info13080393] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Xu D, Miller T. A simple neural vector space model for medical concept normalization using concept embeddings. J Biomed Inform 2022;130:104080. [PMID: 35472514 PMCID: PMC9351985 DOI: 10.1016/j.jbi.2022.104080] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 04/15/2022] [Accepted: 04/19/2022] [Indexed: 11/24/2022]

Yaeger JP, Lu J, Jones J, Ertefaie A, Fiscella K, Gildea D. Derivation of a natural language processing algorithm to identify febrile infants. J Hosp Med 2022;17:11-18. [PMID: 35504534 DOI: 10.1002/jhm.2732] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 11/24/2021] [Accepted: 12/09/2021] [Indexed: 11/08/2022]

Abstract

BACKGROUND

Diagnostic codes can retrospectively identify samples of febrile infants, but sensitivity is low, resulting in many febrile infants eluding detection. To ensure study samples are representative, an improved approach is needed.

OBJECTIVE

To derive and internally validate a natural language processing algorithm to identify febrile infants and compare its performance to diagnostic codes.

METHODS

This cross-sectional study consisted of infants aged 0-90 days brought to one pediatric emergency department from January 2016 to December 2017. We aimed to identify infants with fever, defined as a documented temperature ≥38°C. We used 2017 clinical notes to develop two rule-based algorithms to identify infants with fever and tested them on data from 2016. Using manual abstraction as the gold standard, we compared performance of the two rule-based algorithms (Models 1, 2) to four previously published diagnostic code groups (Models 5-8) using area under the receiver-operating characteristics curve (AUC), sensitivity, and specificity.

RESULTS

For the test set (n = 1190 infants), 184 infants were febrile (15.5%). The AUCs (0.92-0.95) and sensitivities (86%-92%) of Models 1 and 2 were significantly greater than Models 5-8 (0.67-0.74; 20%-74%) with similar specificities (93%-99%). In contrast to Models 5-8, samples from Models 1 and 2 demonstrated similar characteristics to the gold standard, including fever prevalence, median age, and rates of bacterial infections, hospitalizations, and severe outcomes.

CONCLUSIONS

Findings suggest rule-based algorithms can accurately identify febrile infants with greater sensitivity while preserving specificity compared to diagnostic codes. If externally validated, rule-based algorithms may be important tools to create representative study samples, thereby improving generalizability of findings.

Collapse

Von Gerich H, Moen H, Block LJ, Chu CH, DeForest H, Hobensack M, Michalowski M, Mitchell J, Nibber R, Olalia MA, Pruinelli L, Ronquillo CE, Topaz M, Peltonen LM. Artificial Intelligence -based technologies in nursing: A scoping literature review of the evidence. Int J Nurs Stud 2021;127:104153. [DOI: 10.1016/j.ijnurstu.2021.104153] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 11/23/2021] [Accepted: 12/01/2021] [Indexed: 12/20/2022]

Jaeger SR, Rasmussen MA. Importance of data preparation when analysing written responses to open-ended questions: An empirical assessment and comparison with manual coding. Food Qual Prefer 2021. [DOI: 10.1016/j.foodqual.2021.104270] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Woo K, Song J, Adams V, Block LJ, Currie LM, Shang J, Topaz M. Exploring prevalence of wound infections and related patient characteristics in homecare using natural language processing. Int Wound J 2021;19:211-221. [PMID: 34105873 PMCID: PMC8684883 DOI: 10.1111/iwj.13623] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/06/2021] [Accepted: 05/12/2021] [Indexed: 12/13/2022] Open

Senders JT, Cho LD, Calvachi P, McNulty JJ, Ashby JL, Schulte IS, Almekkawi AK, Mehrtash A, Gormley WB, Smith TR, Broekman MLD, Arnaout O. Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma. JCO Clin Cancer Inform 2021;4:25-34. [PMID: 31977252 DOI: 10.1200/cci.19.00060] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open

Abstract

PURPOSE

The aim of this study was to develop an open-source natural language processing (NLP) pipeline for text mining of medical information from clinical reports. We also aimed to provide insight into why certain variables or reports are more suitable for clinical text mining than others.

MATERIALS AND METHODS

Various NLP models were developed to extract 15 radiologic characteristics from free-text radiology reports for patients with glioblastoma. Ten-fold cross-validation was used to optimize the hyperparameter settings and estimate model performance. We examined how model performance was associated with quantitative attributes of the radiologic characteristics and reports.

RESULTS

In total, 562 unique brain magnetic resonance imaging reports were retrieved. NLP extracted 15 radiologic characteristics with high to excellent discrimination (area under the curve, 0.82 to 0.98) and accuracy (78.6% to 96.6%). Model performance was correlated with the inter-rater agreement of the manually provided labels (ρ = 0.904; P < .001) but not with the frequency distribution of the variables of interest (ρ = 0.179; P = .52). All variables labeled with a near perfect inter-rater agreement were classified with excellent performance (area under the curve > 0.95). Excellent performance could be achieved for variables with only 50 to 100 observations in the minority group and class imbalances up to a 9:1 ratio. Report-level classification accuracy was not associated with the number of words or the vocabulary size in the distinct text documents.

CONCLUSION

This study provides an open-source NLP pipeline that allows for text mining of narratively written clinical reports. Small sample sizes and class imbalance should not be considered as absolute contraindications for text mining in clinical research. However, future studies should report measures of inter-rater agreement whenever ground truth is based on a consensus label and use this measure to identify clinical variables eligible for text mining.

Collapse

Affiliation(s)

Joeky T Senders Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Department of Neurosurgery, Leiden University Medical Center, Leiden, the Netherlands
Logan D Cho Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Department of Neuroscience, Brown University, Providence, RI
Paola Calvachi Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
John J McNulty Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
Joanna L Ashby Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
Isabelle S Schulte Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
Ahmad Kareem Almekkawi Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
Alireza Mehrtash Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
William B Gormley Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
Timothy R Smith Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
Marike L D Broekman Department of Neurosurgery, Leiden University Medical Center, Leiden, the Netherlands.,Department of Neurosurgery, Haaglanden Medical Center, The Hague, the Netherlands
Omar Arnaout Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA

Collapse

Woo K, Adams V, Wilson P, Fu LH, Cato K, Rossetti SC, McDonald M, Shang J, Topaz M. Identifying Urinary Tract Infection-Related Information in Home Care Nursing Notes. J Am Med Dir Assoc 2021;22:1015-1021.e2. [PMID: 33434568 PMCID: PMC8106637 DOI: 10.1016/j.jamda.2020.12.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 07/28/2020] [Accepted: 12/06/2020] [Indexed: 12/12/2022]

Abstract

Objectives:

Urinary tract infection (UTI) is common in home care but not easily captured with standard assessment. This study aimed to examine the value of nursing notes in detecting UTI signs and symptoms in home care.

Design:

The study developed a natural language processing (NLP) algorithm to automatically identify UTI-related information in nursing notes.

Setting and Participants:

Home care visit notes (n = 1,149,586) and care coordination notes (n = 1,461,171) for 89,459 patients treated in the largest nonprofit home care agency in the United States during 2014.

Measures:

We generated 6 categories of UTI-related information from literature and used the Unified Medical Language System (UMLS) to identify a preliminary list of terms. The NLP algorithm was tested on a gold standard set of 300 clinical notes annotated by clinical experts. We used structured Outcome and Assessment Information Set data to extract the frequency of UTI-related emergency department (ED) visits or hospitalizations and explored time-patterns in documentation of UTI-related information.

Results:

The NLP system achieved very good overall performance (F measure = 0.9, 95% CI: 0.87–0.93) based on the test results obtained by using the notes for patients admitted to the ED or hospital due to UTI. UTI-related information was significantly more prevalent (P < .01 for all the tests) in home care episodes with UTI-related ED admission or hospitalization vs the general patient population; 81% of home care episodes with UTI-related hospitalization or ED admission had at least 1 category of UTI-related information vs 21.6% among episodes without UTI-related hospitalization or ED admission. Frequency of UTI-related information documentation increased in advance of UTI-related hospitalization or ED admission, peaking within a few days before the event.

Conclusions and Implications:

Information in nursing notes is often overlooked by stakeholders and not integrated into predictive modeling for decision-making support, but our findings highlight their value in early risk identification and care guidance. Health care administrators should consider using NLP to extract clinical data from nursing notes to improve early detection and treatment, which may lead to quality improvement and cost reduction.

Collapse

Topaz M, Koleck TA, Onorato N, Smaldone A, Bakken S. Nursing documentation of symptoms is associated with higher risk of emergency department visits and hospitalizations in homecare patients. Nurs Outlook 2020;69:435-446. [PMID: 33386145 DOI: 10.1016/j.outlook.2020.12.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 10/23/2020] [Accepted: 12/11/2020] [Indexed: 10/22/2022]

Dionisi S, Di Simone E, Alicastro GM, Angelini S, Giannetta N, Iacorossi L, Di Muzio M. Nursing Summary: designing a nursing section in the Electronic Health Record. ACTA BIO-MEDICA : ATENEI PARMENSIS 2019;90:293-299. [PMID: 31580318 PMCID: PMC7233749 DOI: 10.23750/abm.v90i3.7411] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2018] [Accepted: 06/21/2018] [Indexed: 11/23/2022]

Ho KF, Ho CH, Chung MH. Theoretical integration of user satisfaction and technology acceptance of the nursing process information system. PLoS One 2019;14:e0217622. [PMID: 31163076 PMCID: PMC6548361 DOI: 10.1371/journal.pone.0217622] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Accepted: 05/15/2019] [Indexed: 12/01/2022] Open

Gulden C, Kirchner M, Schüttler C, Hinderer M, Kampf M, Prokosch HU, Toddenroth D. Extractive summarization of clinical trial descriptions. Int J Med Inform 2019;129:114-121. [PMID: 31445245 DOI: 10.1016/j.ijmedinf.2019.05.019] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2018] [Revised: 04/06/2019] [Accepted: 05/21/2019] [Indexed: 10/26/2022]

Abstract

PURPOSE

Text summarization of clinical trial descriptions has the potential to reduce the time required to familiarize oneself with the subject of studies by condensing long-form detailed descriptions to concise, meaning-preserving synopses. This work describes the process and quality of automatically generated summaries of clinical trial descriptions using extractive text summarization methods.

METHODS

We generated a novel dataset from the detailed descriptions and brief summaries of trials registered on clinicaltrials.gov. We executed several text summarization algorithms on the detailed descriptions in this corpus and calculated the standard ROUGE metrics using the brief summaries included in the record as a reference. To investigate the correlation of these metrics with human sentiments, four reviewers assessed the content-completeness of the generated summaries and the helpfulness of both the generated and reference summaries via a Likert scale questionnaire.

RESULTS

The filtering stages of the dataset generation process reduce the 277,228 trials registered on clinicaltrials.gov to 101,016 records usable for the summarization task. On average, the summaries in this corpus are 25% the length of the detailed descriptions. Of the evaluated text summarization methods, the TextRank algorithm exhibits the overall best performance with a ROUGE-1 F1 score of 0.3531, ROUGE-2 F1 score of 0.1723, and ROUGE-L F1 score of 0.3003. These scores correlate with the assessment of the helpfulness and content similarity by the human reviewers. Inter-rater agreement for the helpfulness and content similarity was slight and fair respectively (Fleiss' kappa of 0.12 and 0.22).

CONCLUSIONS

Extractive summarization is a viable tool for generating meaning-preserving synopses of detailed clinical trial descriptions. Further, the human evaluation has shown that the ROUGE-L F1 score is useful for rating the general quality of generated summaries of clinical trial descriptions in an automated way.

Collapse

Luo YF, Sun W, Rumshisky A. MCN: A comprehensive corpus for medical concept normalization. J Biomed Inform 2019;92:103132. [DOI: 10.1016/j.jbi.2019.103132] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 01/18/2019] [Accepted: 02/15/2019] [Indexed: 11/25/2022]

Topaz M, Murga L, Gaddis KM, McDonald MV, Bar-Bachar O, Goldberg Y, Bowles KH. Mining fall-related information in clinical notes: Comparison of rule-based and novel word embedding-based machine learning approaches. J Biomed Inform 2019;90:103103. [PMID: 30639392 DOI: 10.1016/j.jbi.2019.103103] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 11/14/2018] [Accepted: 12/31/2018] [Indexed: 10/27/2022]

Abstract

BACKGROUND

Natural language processing (NLP) of health-related data is still an expertise demanding, and resource expensive process. We created a novel, open source rapid clinical text mining system called NimbleMiner. NimbleMiner combines several machine learning techniques (word embedding models and positive only labels learning) to facilitate the process in which a human rapidly performs text mining of clinical narratives, while being aided by the machine learning components.

OBJECTIVE

This manuscript describes the general system architecture and user Interface and presents results of a case study aimed at classifying fall-related information (including fall history, fall prevention interventions, and fall risk) in homecare visit notes.

METHODS

We extracted a corpus of homecare visit notes (n = 1,149,586) for 89,459 patients from a large US-based homecare agency. We used a gold standard testing dataset of 750 notes annotated by two human reviewers to compare the NimbleMiner's ability to classify documents regarding whether they contain fall-related information with a previously developed rule-based NLP system.

RESULTS

NimbleMiner outperformed the rule-based system in almost all domains. The overall F- score was 85.8% compared to 81% by the rule based-system with the best performance for identifying general fall history (F = 89% vs. F = 85.1% rule-based), followed by fall risk (F = 87% vs. F = 78.7% rule-based), fall prevention interventions (F = 88.1% vs. F = 78.2% rule-based) and fall within 2 days of the note date (F = 83.1% vs. F = 80.6% rule-based). The rule-based system achieved slightly better performance for fall within 2 weeks of the note date (F = 81.9% vs. F = 84% rule-based).

DISCUSSION & CONCLUSIONS

NimbleMiner outperformed other systems aimed at fall information classification, including our previously developed rule-based approach. These promising results indicate that clinical text mining can be implemented without the need for large labeled datasets necessary for other types of machine learning. This is critical for domains with little NLP developments, like nursing or allied health professions.

Collapse

Gehrmann S, Dernoncourt F, Li Y, Carlson ET, Wu JT, Welt J, Foote J, Moseley ET, Grant DW, Tyler PD, Celi LA. Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives. PLoS One 2018;13:e0192360. [PMID: 29447188 PMCID: PMC5813927 DOI: 10.1371/journal.pone.0192360] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 01/21/2018] [Indexed: 01/22/2023] Open

Abstract

In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.

Collapse

Affiliation(s)

Sebastian Gehrmann MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America Harvard SEAS, Harvard University, Cambridge, MA, United States of America * E-mail:
Franck Dernoncourt MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America Massachusetts Institute of Technology, Cambridge, MA, United States of America Adobe Research, San Jose, CA, United States of America
Yeran Li MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America Harvard T.H. Chan School of Public Health, Cambridge, MA, United States of America
Eric T. Carlson MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America Philips Research North America, Cambridge, MA, United States of America
Joy T. Wu MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America Harvard T.H. Chan School of Public Health, Cambridge, MA, United States of America
Jonathan Welt MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America Wellman Center for Photomedicine, Massachusetts General Hospital, Boston, MA, United States of America
John Foote MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America Tufts University School of Medicine, Cambridge, MA, United States of America
Edward T. Moseley MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America College of Science and Mathematics, University of Massachusetts, Boston, MA, United States of America
David W. Grant MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America Department of Surgery, Division of Plastic and Reconstructive Surgery, Washington University School of Medicine, St. Louis, MO, United States of America
Patrick D. Tyler MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America Department of Internal Medicine, Beth Israel Deaconess Medical Center, Boston, MA, United States of America
Leo A. Celi MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America Massachusetts Institute of Technology, Cambridge, MA, United States of America

Collapse

Manias E, Gray K, Wickramasinghe N. Patient and family engagement with hospital electronic systems: Juggling for co-existence. Int J Nurs Stud 2017;68:A1-A3. [PMID: 28187902 DOI: 10.1016/j.ijnurstu.2017.01.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]