Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Beaney T, Clarke J, Salman D, Woodcock T, Majeed A, Barahona M, Aylin P. Identifying potential biases in code sequences in primary care electronic healthcare records: a retrospective cohort study of the determinants of code frequency. BMJ Open 2023;13:e072884. [PMID: 37758674 PMCID: PMC10537851 DOI: 10.1136/bmjopen-2023-072884] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023] Open

For:	Beaney T, Clarke J, Salman D, Woodcock T, Majeed A, Barahona M, Aylin P. Identifying potential biases in code sequences in primary care electronic healthcare records: a retrospective cohort study of the determinants of code frequency. BMJ Open 2023;13:e072884. [PMID: 37758674 PMCID: PMC10537851 DOI: 10.1136/bmjopen-2023-072884] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023] Open

Number

Cited by Other Article(s)

Beaney T, Jha S, Alaa A, Smith A, Clarke J, Woodcock T, Majeed A, Aylin P, Barahona M. Comparing natural language processing representations of coded disease sequences for prediction in electronic health records. J Am Med Inform Assoc 2024;31:1451-1462. [PMID: 38719204 PMCID: PMC11187492 DOI: 10.1093/jamia/ocae091] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 04/02/2024] [Accepted: 04/12/2024] [Indexed: 06/21/2024] Open

Abstract

OBJECTIVE

Natural language processing (NLP) algorithms are increasingly being applied to obtain unsupervised representations of electronic health record (EHR) data, but their comparative performance at predicting clinical endpoints remains unclear. Our objective was to compare the performance of unsupervised representations of sequences of disease codes generated by bag-of-words versus sequence-based NLP algorithms at predicting clinically relevant outcomes.

MATERIALS AND METHODS

This cohort study used primary care EHRs from 6 286 233 people with Multiple Long-Term Conditions in England. For each patient, an unsupervised vector representation of their time-ordered sequences of diseases was generated using 2 input strategies (212 disease categories versus 9462 diagnostic codes) and different NLP algorithms (Latent Dirichlet Allocation, doc2vec, and 2 transformer models designed for EHRs). We also developed a transformer architecture, named EHR-BERT, incorporating sociodemographic information. We compared the performance of each of these representations (without fine-tuning) as inputs into a logistic classifier to predict 1-year mortality, healthcare use, and new disease diagnosis.

RESULTS

Patient representations generated by sequence-based algorithms performed consistently better than bag-of-words methods in predicting clinical endpoints, with the highest performance for EHR-BERT across all tasks, although the absolute improvement was small. Representations generated using disease categories perform similarly to those using diagnostic codes as inputs, suggesting models can equally manage smaller or larger vocabularies for prediction of these outcomes.

DISCUSSION AND CONCLUSION

Patient representations produced by sequence-based NLP algorithms from sequences of disease codes demonstrate improved predictive content for patient outcomes compared with representations generated by co-occurrence-based algorithms. This suggests transformer models may be useful for generating multi-purpose representations, even without fine-tuning.

Collapse

Jain H, Odat RM, Goyal A, Jain J, Dey D, Ahmed M, Wasir AS, Passey S, Gole S. Association between psoriasis and atrial fibrillation: A Systematic review and meta-analysis. Curr Probl Cardiol 2024;49:102538. [PMID: 38521291 DOI: 10.1016/j.cpcardiol.2024.102538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 03/20/2024] [Indexed: 03/25/2024]

Beaney T, Clarke J, Salman D, Woodcock T, Majeed A, Aylin P, Barahona M. Identifying multi-resolution clusters of diseases in ten million patients with multimorbidity in primary care in England. COMMUNICATIONS MEDICINE 2024;4:102. [PMID: 38811835 PMCID: PMC11137021 DOI: 10.1038/s43856-024-00529-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 05/20/2024] [Indexed: 05/31/2024] Open

Beaney T, Clarke J, Woodcock T, Majeed A, Barahona M, Aylin P. Effect of timeframes to define long term conditions and sociodemographic factors on prevalence of multimorbidity using disease code frequency in primary care electronic health records: retrospective study. BMJ MEDICINE 2024;3:e000474. [PMID: 38361663 PMCID: PMC10868275 DOI: 10.1136/bmjmed-2022-000474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/12/2023] [Indexed: 02/17/2024]

Abstract

Objective

To determine the extent to which the choice of timeframe used to define a long term condition affects the prevalence of multimorbidity and whether this varies with sociodemographic factors.

Design

Retrospective study of disease code frequency in primary care electronic health records.

Data sources

Routinely collected, general practice, electronic health record data from the Clinical Practice Research Datalink Aurum were used.

Main outcome measures

Adults (≥18 years) in England who were registered in the database on 1 January 2020 were included. Multimorbidity was defined as the presence of two or more conditions from a set of 212 long term conditions. Multimorbidity prevalence was compared using five definitions. Any disease code recorded in the electronic health records for 212 conditions was used as the reference definition. Additionally, alternative definitions for 41 conditions requiring multiple codes (where a single disease code could indicate an acute condition) or a single code for the remaining 171 conditions were as follows: two codes at least three months apart; two codes at least 12 months apart; three codes within any 12 month period; and any code in the past 12 months. Mixed effects regression was used to calculate the expected change in multimorbidity status and number of long term conditions according to each definition and associations with patient age, gender, ethnic group, and socioeconomic deprivation.

Results

9 718 573 people were included in the study, of whom 7 183 662 (73.9%) met the definition of multimorbidity where a single code was sufficient to define a long term condition. Variation was substantial in the prevalence according to timeframe used, ranging from 41.4% (n=4 023 023) for three codes in any 12 month period, to 55.2% (n=5 366 285) for two codes at least three months apart. Younger people (eg, 50-75% probability for 18-29 years v 1-10% for ≥80 years), people of some minority ethnic groups (eg, people in the Other ethnic group had higher probability than the South Asian ethnic group), and people living in areas of lower socioeconomic deprivation were more likely to be re-classified as not multimorbid when using definitions requiring multiple codes.

Conclusions

Choice of timeframe to define long term conditions has a substantial effect on the prevalence of multimorbidity in this nationally representative sample. Different timeframes affect prevalence for some people more than others, highlighting the need to consider the impact of bias in the choice of method when defining multimorbidity.

Collapse

Beaney T, Clarke J, Salman D, Woodcock T, Majeed A, Barahona M, Aylin P. Assigning disease clusters to people: A cohort study of the implications for understanding health outcomes in people with multiple long-term conditions. JOURNAL OF MULTIMORBIDITY AND COMORBIDITY 2024;14:26335565241247430. [PMID: 38638408 PMCID: PMC11025432 DOI: 10.1177/26335565241247430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 03/25/2024] [Indexed: 04/20/2024]