1
|
Beaney T, Clarke J, Salman D, Woodcock T, Majeed A, Aylin P, Barahona M. Identifying multi-resolution clusters of diseases in ten million patients with multimorbidity in primary care in England. COMMUNICATIONS MEDICINE 2024; 4:102. [PMID: 38811835 PMCID: PMC11137021 DOI: 10.1038/s43856-024-00529-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 05/20/2024] [Indexed: 05/31/2024] Open
Abstract
BACKGROUND Identifying clusters of diseases may aid understanding of shared aetiology, management of co-morbidities, and the discovery of new disease associations. Our study aims to identify disease clusters using a large set of long-term conditions and comparing methods that use the co-occurrence of diseases versus methods that use the sequence of disease development in a person over time. METHODS We use electronic health records from over ten million people with multimorbidity registered to primary care in England. First, we extract data-driven representations of 212 diseases from patient records employing (i) co-occurrence-based methods and (ii) sequence-based natural language processing methods. Second, we apply the graph-based Markov Multiscale Community Detection (MMCD) to identify clusters based on disease similarity at multiple resolutions. We evaluate the representations and clusters using a clinically curated set of 253 known disease association pairs, and qualitatively assess the interpretability of the clusters. RESULTS Both co-occurrence and sequence-based algorithms generate interpretable disease representations, with the best performance from the skip-gram algorithm. MMCD outperforms k-means and hierarchical clustering in explaining known disease associations. We find that diseases display an almost-hierarchical structure across resolutions from closely to more loosely similar co-occurrence patterns and identify interpretable clusters corresponding to both established and novel patterns. CONCLUSIONS Our method provides a tool for clustering diseases at different levels of resolution from co-occurrence patterns in high-dimensional electronic health records, which could be used to facilitate discovery of associations between diseases in the future.
Collapse
Affiliation(s)
- Thomas Beaney
- Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, UK.
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.
| | - Jonathan Clarke
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK
| | - David Salman
- Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, UK
- MSk Lab, Department of Surgery and Cancer, Imperial College London, London, W12 0BZ, UK
| | - Thomas Woodcock
- Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, UK
| | - Azeem Majeed
- Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, UK
| | - Paul Aylin
- Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, UK
| | - Mauricio Barahona
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
2
|
Beaney T, Clarke J, Salman D, Woodcock T, Majeed A, Barahona M, Aylin P. Identifying potential biases in code sequences in primary care electronic healthcare records: a retrospective cohort study of the determinants of code frequency. BMJ Open 2023; 13:e072884. [PMID: 37758674 PMCID: PMC10537851 DOI: 10.1136/bmjopen-2023-072884] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023] Open
Abstract
OBJECTIVES To determine whether the frequency of diagnostic codes for long-term conditions (LTCs) in primary care electronic healthcare records (EHRs) is associated with (1) disease coding incentives, (2) General Practice (GP), (3) patient sociodemographic characteristics and (4) calendar year of diagnosis. DESIGN Retrospective cohort study. SETTING GPs in England from 2015 to 2022 contributing to the Clinical Practice Research Datalink Aurum dataset. PARTICIPANTS All patients registered to a GP with at least one incident LTC diagnosed between 1 January 2015 and 31 December 2019. PRIMARY AND SECONDARY OUTCOME MEASURES The number of diagnostic codes for an LTC in (1) the first and (2) the second year following diagnosis, stratified by inclusion in the Quality and Outcomes Framework (QOF) financial incentive programme. RESULTS 3 113 724 patients were included, with 7 723 365 incident LTCs. Conditions included in QOF had higher rates of annual coding than conditions not included in QOF (1.03 vs 0.32 per year, p<0.0001). There was significant variation in code frequency by GP which was not explained by patient sociodemographics. We found significant associations with patient sociodemographics, with a trend towards higher coding rates in people living in areas of higher deprivation for both QOF and non-QOF conditions. Code frequency was lower for conditions with follow-up time in 2020, associated with the onset of the COVID-19 pandemic. CONCLUSIONS The frequency of diagnostic codes for newly diagnosed LTCs is influenced by factors including patient sociodemographics, disease inclusion in QOF, GP practice and the impact of the COVID-19 pandemic. Natural language processing or other methods using temporally ordered code sequences should account for these factors to minimise potential bias.
Collapse
Affiliation(s)
- Thomas Beaney
- Department of Primary Care and Public Health, Imperial College London, London, UK
- Department of Mathematics, Imperial College London, London, UK
| | - Jonathan Clarke
- Department of Mathematics, Imperial College London, London, UK
| | - David Salman
- Department of Primary Care and Public Health, Imperial College London, London, UK
- MSk Lab, Imperial College London, London, UK
| | - Thomas Woodcock
- Department of Primary Care and Public Health, Imperial College London, London, UK
| | - Azeem Majeed
- Department of Primary Care and Public Health, Imperial College London, London, UK
| | | | - Paul Aylin
- Department of Primary Care and Public Health, Imperial College London, London, UK
| |
Collapse
|
3
|
Chillakuru YR, Preciado DA, Cha J, Mann H, Behzadpour HK, Espinel AG. Deep Learning for Predictive Analysis of Pediatric Otolaryngology Personal Statements: A Pilot Study. Otolaryngol Head Neck Surg 2022; 167:877-884. [PMID: 35259040 DOI: 10.1177/01945998221082535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
OBJECTIVE The personal statement is often an underutilized aspect of pediatric otolaryngology fellowship applications. In this pilot study, we use deep learning language models to cluster personal statements and elucidate their relationship to applicant rank position and postfellowship research output. STUDY DESIGN Retrospective cohort. SETTING Single pediatric tertiary care center. METHODS Data and personal statements from 115 applicants to our fellowship program were retrieved from San Francisco Match. BERT (Bidirectional Encoder Representations From Transformers) was used to generate document embeddings for clustering. Regression and machine learning models were used to assess the relationship of personal statements to number of postfellowship publications per year when controlling for publications, board scores, Alpha Omega Alpha status, gender, and residency. RESULTS Document embeddings of personal statements were found to cluster into 4 distinct groups by K-means clustering: 2 focused on "training/research" and 2 on "personal/patient anecdotes." Training clusters 1 and 2 were associated with an applicant-organization fit by a single pediatric otolaryngology fellowship program on univariate but not multivariate analysis. Models utilizing document embeddings alone were able to equally predict applicant-organization fit (receiver operating characteristic areas under the curve, 0.763 and 0.750 vs 0.419; P values >.05) as compared with models utilizing applicant characteristics and personal statement clusters alone. All predictive models were poor predictors of postfellowship publications per year. CONCLUSION We demonstrate ability for document embeddings to capture meaningful information in personal statements from pediatric otolaryngology fellowship applicants. A larger study can further differentiate personal statement clusters and assess the predictive potential of document embeddings.
Collapse
Affiliation(s)
- Yeshwant Reddy Chillakuru
- Sheikh Zayed Center for Pediatric Surgical Innovation and Division of Otolaryngology, Children's National Health System, Washington, DC, USA
| | - Diego A Preciado
- Sheikh Zayed Center for Pediatric Surgical Innovation and Division of Otolaryngology, Children's National Health System, Washington, DC, USA
| | - Jeremy Cha
- Sheikh Zayed Center for Pediatric Surgical Innovation and Division of Otolaryngology, Children's National Health System, Washington, DC, USA
| | - Hannah Mann
- Sheikh Zayed Center for Pediatric Surgical Innovation and Division of Otolaryngology, Children's National Health System, Washington, DC, USA
| | - Hengameh K Behzadpour
- Sheikh Zayed Center for Pediatric Surgical Innovation and Division of Otolaryngology, Children's National Health System, Washington, DC, USA
| | - Alexandra Genevieve Espinel
- Sheikh Zayed Center for Pediatric Surgical Innovation and Division of Otolaryngology, Children's National Health System, Washington, DC, USA
| |
Collapse
|
4
|
Abstract
The Internet of Things technology offers convenience and innovation in areas such as smart homes and smart cities. Internet of Things solutions require careful management of devices and the risk mitigation of potential vulnerabilities within cyber-physical systems. The Internet of Things concept, its implementations, and applications are frequently discussed on social media platforms. This research illuminates the public view of the Internet of Things through a content-based and network analysis of contemporary conversations occurring on the Twitter platform. Tweets can be analyzed with machine learning methods to converge the volume and variety of conversations into predictive and descriptive models. We have reviewed 684,503 tweets collected in a 2-week period. Using supervised and unsupervised machine learning methods, we have identified trends within the realm of IoT and their interconnecting relationships between the most mentioned industries. We have identified characteristics of language sentiment which can help to predict the popularity of IoT conversation topics. We found the healthcare industry as the leading use case industry for IoT implementations. This is not surprising as the current COVID-19 pandemic is driving significant social media discussions. There was an alarming dearth of conversations towards cybersecurity. Recent breaches and ransomware events denote that organizations should spend more time communicating about risks and mitigations. Only 12% of the tweets relating to the Internet of Things contained any mention of topics such as encryption, vulnerabilities, or risk, among other cybersecurity-related terms. We propose an IoT Cybersecurity Communication Scorecard to help organizations benchmark the density and sentiment of their corporate communications regarding security against their specific industry.
Collapse
|
5
|
Clarke J, Murray A, Markar SR, Barahona M, Kinross J. New geographic model of care to manage the post-COVID-19 elective surgery aftershock in England: a retrospective observational study. BMJ Open 2020; 10:e042392. [PMID: 33130573 PMCID: PMC7783383 DOI: 10.1136/bmjopen-2020-042392] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 09/29/2020] [Accepted: 09/30/2020] [Indexed: 12/13/2022] Open
Abstract
OBJECTIVES The suspension of elective surgery during the COVID-19 pandemic is unprecedented and has resulted in record volumes of patients waiting for operations. Novel approaches that maximise capacity and efficiency of surgical care are urgently required. This study applies Markov multiscale community detection (MMCD), an unsupervised graph-based clustering framework, to identify new surgical care models based on pooled waiting-lists delivered across an expanded network of surgical providers. DESIGN Retrospective observational study using Hospital Episode Statistics. SETTING Public and private hospitals providing surgical care to National Health Service (NHS) patients in England. PARTICIPANTS All adult patients resident in England undergoing NHS-funded planned surgical procedures between 1 April 2017 and 31 March 2018. MAIN OUTCOME MEASURES The identification of the most common planned surgical procedures in England (high-volume procedures (HVP)) and proportion of low, medium and high-risk patients undergoing each HVP. The mapping of hospitals providing surgical care onto optimised groupings based on patient usage data. RESULTS A total of 7 811 891 planned operations were identified in 4 284 925 adults during the 1-year period of our study. The 28 most common surgical procedures accounted for a combined 3 907 474 operations (50.0% of the total). 2 412 613 (61.7%) of these most common procedures involved 'low risk' patients. Patients travelled an average of 11.3 km for these procedures. Based on the data, MMCD partitioned England into 45, 16 and 7 mutually exclusive and collectively exhaustive natural surgical communities of increasing coarseness. The coarser partitions into 16 and seven surgical communities were shown to be associated with balanced supply and demand for surgical care within communities. CONCLUSIONS Pooled waiting-lists for low-risk elective procedures and patients across integrated, expanded natural surgical community networks have the potential to increase efficiency by innovatively flexing existing supply to better match demand.
Collapse
Affiliation(s)
- Jonathan Clarke
- Department of Mathematics, Imperial College of Science, Technology and Medicine, London, UK
| | - Alice Murray
- Department of Surgery and Cancer, Imperial College of Science, Technology and Medicine, London, UK
| | - Sheraz Rehan Markar
- Department of Surgery and Cancer, Imperial College of Science, Technology and Medicine, London, UK
| | - Mauricio Barahona
- Department of Mathematics, Imperial College of Science, Technology and Medicine, London, UK
| | - James Kinross
- Department of Surgery and Cancer, Imperial College of Science, Technology and Medicine, London, UK
| |
Collapse
|
6
|
Clarke J, Beaney T, Majeed A, Darzi A, Barahona M. Identifying naturally occurring communities of primary care providers in the English National Health Service in London. BMJ Open 2020; 10:e036504. [PMID: 32690744 PMCID: PMC7375630 DOI: 10.1136/bmjopen-2019-036504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVES Primary Care Networks (PCNs) are a new organisational hierarchy with wide-ranging responsibilities introduced in the National Health Service (NHS) Long Term Plan. The vision is that PCNs should represent 'natural' communities of general practices (GP practices) collaborating at scale and covering a geography that fits well with practices, other healthcare providers and local communities. Our study aims to identify natural communities of GP practices based on patient registration patterns using Markov Multiscale Community Detection, an unsupervised network-based clustering technique to create catchments for these communities. DESIGN Retrospective observational study using Hospital Episode Statistics - patient-level administrative records of attendances to hospital. SETTING General practices in the 32 Clinical Commissioning Groups of Greater London PARTICIPANTS: All adult patients resident in and registered to a GP practice in Greater London that had one or more outpatient encounters at NHS hospitals between 1st April 2017 and 31st March 2018. MAIN OUTCOME MEASURES The allocation of GP practices in Greater London to PCNs based on the registrations of patients resident in each Lower Layer Super Output Area (LSOA) of Greater London. The population size and coverage of each proposed PCN. RESULTS 3 428 322 unique patients attended 1334 GPs in 4835 LSOAs in Greater London. Our model grouped 1291 GPs (96.8%) and 4721 LSOAs (97.6%) into 165 mutually exclusive PCNs. Median PCN list size was 53 490, with a lower quartile of 38 079 patients and an upper quartile of 72 982 patients. A median of 70.1% of patients attended a GP within their allocated PCN, ranging from 44.6% to 91.4%. CONCLUSIONS With PCNs expected to take a role in population health management and with community providers expected to reconfigure around them, it is vital to recognise how PCNs represent their communities. Our method may be used by policymakers to understand the populations and geography shared between networks.
Collapse
Affiliation(s)
- Jonathan Clarke
- Centre for Health Policy, Imperial College London, London, UK
- Centre for Mathematics of Precision Healthcare, Imperial College London, London, UK
| | - Thomas Beaney
- Department of Primary Care, Imperial College of Science Technology and Medicine, London, UK
| | | | | | - Mauricio Barahona
- Centre for Mathematics of Precision Healthcare, Imperial College London, London, UK
- Department of Mathematics, Imperial College London, London, UK
| |
Collapse
|