1
|
Gordillo-Marañón M, Schmidt AF, Warwick A, Tomlinson C, Ytsma C, Engmann J, Torralbo A, Maclean R, Sofat R, Langenberg C, Shah AD, Denaxas S, Pirmohamed M, Hemingway H, Hingorani AD, Finan C. Disease coverage of human genome-wide association studies and pharmaceutical research and development. COMMUNICATIONS MEDICINE 2024; 4:195. [PMID: 39379679 PMCID: PMC11461613 DOI: 10.1038/s43856-024-00625-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 09/25/2024] [Indexed: 10/10/2024] Open
Abstract
BACKGROUND Despite the growing interest in the use of human genomic data for drug target identification and validation, the extent to which the spectrum of human disease has been addressed by genome-wide association studies (GWAS), or by drug development, and the degree to which these efforts overlap remain unclear. METHODS In this study we harmonize and integrate different data sources to create a sample space of all the human drug targets and diseases and identify points of convergence or divergence of GWAS and drug development efforts. RESULTS We show that only 612 of 11,158 diseases listed in Human Disease Ontology have an approved drug treatment in at least one region of the world. Of the 1414 diseases that are the subject of preclinical or clinical phase drug development, only 666 have been investigated in GWAS. Conversely, of the 1914 human diseases that have been the subject of GWAS, 1121 have yet to be investigated in drug development. CONCLUSIONS We produce target-disease indication lists to help the pharmaceutical industry to prioritize future drug development efforts based on genetic evidence, academia to prioritize future GWAS for diseases without effective treatments, and both sectors to harness genetic evidence to expand the indications for licensed drugs or to identify repurposing opportunities for clinical candidates that failed in their originally intended indication.
Collapse
|
2
|
Carrasco-Zanini J, Pietzner M, Davitte J, Surendran P, Croteau-Chonka DC, Robins C, Torralbo A, Tomlinson C, Grünschläger F, Fitzpatrick N, Ytsma C, Kanno T, Gade S, Freitag D, Ziebell F, Haas S, Denaxas S, Betts JC, Wareham NJ, Hemingway H, Scott RA, Langenberg C. Proteomic signatures improve risk prediction for common and rare diseases. Nat Med 2024; 30:2489-2498. [PMID: 39039249 PMCID: PMC11405273 DOI: 10.1038/s41591-024-03142-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 06/19/2024] [Indexed: 07/24/2024]
Abstract
For many diseases there are delays in diagnosis due to a lack of objective biomarkers for disease onset. Here, in 41,931 individuals from the United Kingdom Biobank Pharma Proteomics Project, we integrated measurements of ~3,000 plasma proteins with clinical information to derive sparse prediction models for the 10-year incidence of 218 common and rare diseases (81-6,038 cases). We then compared prediction models developed using proteomic data with models developed using either basic clinical information alone or clinical information combined with data from 37 clinical assays. The predictive performance of sparse models including as few as 5 to 20 proteins was superior to the performance of models developed using basic clinical information for 67 pathologically diverse diseases (median delta C-index = 0.07; range = 0.02-0.31). Sparse protein models further outperformed models developed using basic information combined with clinical assay data for 52 diseases, including multiple myeloma, non-Hodgkin lymphoma, motor neuron disease, pulmonary fibrosis and dilated cardiomyopathy. For multiple myeloma, single-cell RNA sequencing from bone marrow in newly diagnosed patients showed that four of the five predictor proteins were expressed specifically in plasma cells, consistent with the strong predictive power of these proteins. External replication of sparse protein models in the EPIC-Norfolk study showed good generalizability for prediction of the six diseases tested. These findings show that sparse plasma protein signatures, including both disease-specific proteins and protein predictors shared across several diseases, offer clinically useful prediction of common and rare diseases.
Collapse
|
3
|
Spear JW, Pissaridou E, Bowyer S, Bryant WA, Key D, Booth J, Spiridou A, Denaxas S, Pope R, Taylor AM, Hemingway H, Sebire NJ. Communicating exploratory unsupervised machine learning analysis in age clustering for paediatric disease. BMJ Health Care Inform 2024; 31:e100963. [PMID: 39074912 PMCID: PMC11288139 DOI: 10.1136/bmjhci-2023-100963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 07/01/2024] [Indexed: 07/31/2024] Open
Abstract
BACKGROUND Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders. METHODS Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed. FINDINGS Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated. CONCLUSION Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.
Collapse
|
4
|
Pietzner M, Denaxas S, Yasmeen S, Ulmer MA, Nakanishi T, Arnold M, Kastenmüller G, Hemingway H, Langenberg C. Complex patterns of multimorbidity associated with severe COVID-19 and long COVID. COMMUNICATIONS MEDICINE 2024; 4:94. [PMID: 38977844 PMCID: PMC11231221 DOI: 10.1038/s43856-024-00506-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 04/19/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Early evidence that patients with (multiple) pre-existing diseases are at highest risk for severe COVID-19 has been instrumental in the pandemic to allocate critical care resources and later vaccination schemes. However, systematic studies exploring the breadth of medical diagnoses are scarce but may help to understand severe COVID-19 among patients at supposedly low risk. METHODS We systematically harmonized >12 million primary care and hospitalisation health records from ~500,000 UK Biobank participants into 1448 collated disease terms to systematically identify diseases predisposing to severe COVID-19 (requiring hospitalisation or death) and its post-acute sequalae, Long COVID. RESULTS Here we identify 679 diseases associated with an increased risk for severe COVID-19 (n = 672) and/or Long COVID (n = 72) that span almost all clinical specialties and are strongly enriched in clusters of cardio-respiratory and endocrine-renal diseases. For 57 diseases, we establish consistent evidence to predispose to severe COVID-19 based on survival and genetic susceptibility analyses. This includes a possible role of symptoms of malaise and fatigue as a so far largely overlooked risk factor for severe COVID-19. We finally observe partially opposing risk estimates at known risk loci for severe COVID-19 for etiologically related diseases, such as post-inflammatory pulmonary fibrosis or rheumatoid arthritis, possibly indicating a segregation of disease mechanisms. CONCLUSIONS Our results provide a unique reference that demonstrates how 1) complex co-occurrence of multiple - including non-fatal - conditions predispose to increased COVID-19 severity and 2) how incorporating the whole breadth of medical diagnosis can guide the interpretation of genetic risk loci.
Collapse
|
5
|
Thayer DS, Mumtaz S, Elmessary MA, Scanlon I, Zinnurov A, Coldea AI, Scanlon J, Chapman M, Curcin V, John A, DelPozo-Banos M, Davies H, Karwath A, Gkoutos GV, Fitzpatrick NK, Quint JK, Varma S, Milner C, Oliveira C, Parkinson H, Denaxas S, Hemingway H, Jefferson E. Creating a next-generation phenotype library: the health data research UK Phenotype Library. JAMIA Open 2024; 7:ooae049. [PMID: 38895652 PMCID: PMC11182945 DOI: 10.1093/jamiaopen/ooae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 02/12/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Objective To enable reproducible research at scale by creating a platform that enables health data users to find, access, curate, and re-use electronic health record phenotyping algorithms. Materials and Methods We undertook a structured approach to identifying requirements for a phenotype algorithm platform by engaging with key stakeholders. User experience analysis was used to inform the design, which we implemented as a web application featuring a novel metadata standard for defining phenotyping algorithms, access via Application Programming Interface (API), support for computable data flows, and version control. The application has creation and editing functionality, enabling researchers to submit phenotypes directly. Results We created and launched the Phenotype Library in October 2021. The platform currently hosts 1049 phenotype definitions defined against 40 health data sources and >200K terms across 16 medical ontologies. We present several case studies demonstrating its utility for supporting and enabling research: the library hosts curated phenotype collections for the BREATHE respiratory health research hub and the Adolescent Mental Health Data Platform, and it is supporting the development of an informatics tool to generate clinical evidence for clinical guideline development groups. Discussion This platform makes an impact by being open to all health data users and accepting all appropriate content, as well as implementing key features that have not been widely available, including managing structured metadata, access via an API, and support for computable phenotypes. Conclusions We have created the first openly available, programmatically accessible resource enabling the global health research community to store and manage phenotyping algorithms. Removing barriers to describing, sharing, and computing phenotypes will help unleash the potential benefit of health data for patients and the public.
Collapse
|
6
|
Pietzner M, Denaxas S, Yasmeen S, Ulmer MA, Nakanishi T, Arnold M, Kastenmüller G, Hemingway H, Langenberg C. Complex patterns of multimorbidity associated with severe COVID-19 and Long COVID. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.05.23.23290408. [PMID: 39006431 PMCID: PMC11245059 DOI: 10.1101/2023.05.23.23290408] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Early evidence that patients with (multiple) pre-existing diseases are at highest risk for severe COVID-19 has been instrumental in the pandemic to allocate critical care resources and later vaccination schemes. However, systematic studies exploring the breadth of medical diagnoses, including common, but non-fatal diseases are scarce, but may help to understand severe COVID-19 among patients at supposedly low risk. Here, we systematically harmonized >12 million primary care and hospitalisation health records from ~500,000 UK Biobank participants into 1448 collated disease terms to systematically identify diseases predisposing to severe COVID-19 (requiring hospitalisation or death) and its post-acute sequalae, Long COVID. We identified a total of 679 diseases associated with an increased risk for severe COVID-19 (n=672) and/or Long COVID (n=72) that spanned almost all clinical specialties and were strongly enriched in clusters of cardio-respiratory and endocrine-renal diseases. For 57 diseases, we established consistent evidence to predispose to severe COVID-19 based on survival and genetic susceptibility analyses. This included a possible role of symptoms of malaise and fatigue as a so far largely overlooked risk factor for severe COVID-19. We finally observed partially opposing risk estimates at known risk loci for severe COVID-19 for etiologically related diseases, such as post-inflammatory pulmonary fibrosis (e.g., MUC5B, NPNT, and PSMD3) or rheumatoid arthritis (e.g., TYK2), possibly indicating a segregation of disease mechanisms. Our results provide a unique reference that demonstrates how 1) complex co-occurrence of multiple - including non-fatal - conditions predispose to increased COVID-19 severity and 2) how incorporating the whole breadth of medical diagnosis can guide the interpretation of genetic risk loci.
Collapse
|
7
|
Ozaltin B, Chapman R, Arfeen MQU, Fitzpatick N, Hemingway H, Direk K, Jacob J. Delineating excess comorbidities in idiopathic pulmonary fibrosis: an observational study. Respir Res 2024; 25:249. [PMID: 38898447 PMCID: PMC11186192 DOI: 10.1186/s12931-024-02875-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 06/10/2024] [Indexed: 06/21/2024] Open
Abstract
BACKGROUND Our study examined whether prevalent and incident comorbidities are increased in idiopathic pulmonary fibrosis (IPF) patients when compared to matched chronic obstructive pulmonary disease (COPD) patients and control subjects without IPF or COPD. METHODS IPF and age, gender and smoking matched COPD patients, diagnosed between 01/01/1997 and 01/01/2019 were identified from the Clinical Practice Research Datalink GOLD database multiple registrations cohort at the first date an ICD-10 or read code mentioned IPF/COPD. A control cohort comprised age, gender and pack-year smoking matched subjects without IPF or COPD. Prevalent (prior to IPF/COPD diagnosis) and incident (after IPF/COPD diagnosis) comorbidities were examined. Group differences were estimated using a t-test. Mortality relationships were examined using multivariable Cox proportional hazards adjusted for patient age, gender and smoking status. RESULTS Across 3055 IPF patients, 38% had 3 or more prevalent comorbidities versus 32% of COPD patients and 21% of matched control subjects. Survival time reduced as the number of comorbidities in an individual increased (p < 0.0001). In IPF, prevalent heart failure (Hazard ratio [HR] = 1.62, 95% Confidence Interval [CI]: 1.43-1.84, p < 0.001), chronic kidney disease (HR = 1.27, 95%CI: 1.10-1.47, p = 0.001), cerebrovascular disease (HR = 1.18, 95%CI: 1.02-1.35, p = 0.02), abdominal and peripheral vascular disease (HR = 1.29, 95%CI: 1.09-1.50, p = 0.003) independently associated with reduced survival. Key comorbidities showed increased incidence in IPF (versus COPD) 7-10 years prior to IPF diagnosis. INTERPRETATION The mortality impact of excessive prevalent comorbidities in IPF versus COPD and smoking matched controls suggests that multiorgan mechanisms of injury need elucidation in patients that develop IPF.
Collapse
|
8
|
Aldridge RW, Evans HER, Yavlinsky A, Moayyeri A, Bhaskaran K, Mathur R, Jordan KP, Croft P, Denaxas S, Shah AD, Blackburn RM, Moller H, Ng ESW, Hughes A, Fox S, Flowers J, Schmidt J, Hayward A, Gilbert R, Smeeth L, Hemingway H. Estimating disease burden using national linked electronic health records: a study using an English population-based cohort. Wellcome Open Res 2024; 8:262. [PMID: 39092423 PMCID: PMC11292189 DOI: 10.12688/wellcomeopenres.19470.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/10/2024] [Indexed: 08/04/2024] Open
Abstract
Background Electronic health records (EHRs) have the potential to be used to produce detailed disease burden estimates. In this study we created disease estimates using national EHR for three high burden conditions, compared estimates between linked and unlinked datasets and produced stratified estimates by age, sex, ethnicity, socio-economic deprivation and geographical region. Methods EHRs containing primary care (Clinical Practice Research Datalink), secondary care (Hospital Episode Statistics) and mortality records (Office for National Statistics) were used. We used existing disease phenotyping algorithms to identify cases of cancer (breast, lung, colorectal and prostate), type 1 and 2 diabetes, and lower back pain. We calculated age-standardised incidence of first cancer, point prevalence for diabetes, and primary care consultation prevalence for low back pain. Results 7.2 million people contributing 45.3 million person-years of active follow-up between 2000-2014 were included. CPRD-HES combined and CPRD-HES-ONS combined lung and bowel cancer incidence estimates by sex were similar to cancer registry estimates. Linked CPRD-HES estimates for combined Type 1 and Type 2 diabetes were consistently higher than those of CPRD alone, with the difference steadily increasing over time from 0.26% (2.99% for CPRD-HES vs. 2.73 for CPRD) in 2002 to 0.58% (6.17% vs. 5.59) in 2013. Low back pain prevalence was highest in the most deprived quintile and when compared to the least deprived quintile the difference in prevalence increased over time between 2000 and 2013, with the largest difference of 27% (558.70 per 10,000 people vs 438.20) in 2013. Conclusions We use national EHRs to produce estimates of burden of disease to produce detailed estimates by deprivation, ethnicity and geographical region. National EHRs have the potential to improve disease burden estimates at a local and global level and may serve as more automated, timely and precise inputs for policy making and global burden of disease estimation.
Collapse
|
9
|
Steinfeldt J, Wild B, Buergel T, Pietzner M, Upmeier Zu Belzen J, Vauvelle A, Hegselmann S, Denaxas S, Hemingway H, Langenberg C, Landmesser U, Deanfield J, Eils R. Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats. Nat Commun 2024; 15:4257. [PMID: 38763986 PMCID: PMC11102902 DOI: 10.1038/s41467-024-48568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 05/03/2024] [Indexed: 05/21/2024] Open
Abstract
The COVID-19 pandemic exposed a global deficiency of systematic, data-driven guidance to identify high-risk individuals. Here, we illustrate the utility of routinely recorded medical history to predict the risk for 1883 diseases across clinical specialties and support the rapid response to emerging health threats such as COVID-19. We developed a neural network to learn from health records of 502,460 UK Biobank. Importantly, we observed discriminative improvements over basic demographic predictors for 1774 (94.3%) endpoints. After transferring the unmodified risk models to the All of US cohort, we replicated these improvements for 1347 (89.8%) of 1500 investigated endpoints, demonstrating generalizability across healthcare systems and historically underrepresented groups. Ultimately, we showed how this approach could have been used to identify individuals vulnerable to severe COVID-19. Our study demonstrates the potential of medical history to support guidance for emerging pandemics by systematically estimating risk for thousands of diseases at once at minimal cost.
Collapse
|
10
|
Kraljevic Z, Bean D, Shek A, Bendayan R, Hemingway H, Yeung JA, Deng A, Baston A, Ross J, Idowu E, Teo JT, Dobson RJB. Foresight-a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study. Lancet Digit Health 2024; 6:e281-e290. [PMID: 38519155 PMCID: PMC11220626 DOI: 10.1016/s2589-7500(24)00025-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 12/20/2023] [Accepted: 02/05/2024] [Indexed: 03/24/2024]
Abstract
BACKGROUND An electronic health record (EHR) holds detailed longitudinal information about a patient's health status and general clinical history, a large portion of which is stored as unstructured, free text. Existing approaches to model a patient's trajectory focus mostly on structured data and a subset of single-domain outcomes. This study aims to evaluate the effectiveness of Foresight, a generative transformer in temporal modelling of patient data, integrating both free text and structured formats, to predict a diverse array of future medical outcomes, such as disorders, substances (eg, to do with medicines, allergies, or poisonings), procedures, and findings (eg, relating to observations, judgements, or assessments). METHODS Foresight is a novel transformer-based pipeline that uses named entity recognition and linking tools to convert EHR document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events, such as disorders, substances, procedures, and findings. The Foresight pipeline has four main components: (1) CogStack (data retrieval and preprocessing); (2) the Medical Concept Annotation Toolkit (structuring of the free-text information from EHRs); (3) Foresight Core (deep-learning model for biomedical concept modelling); and (4) the Foresight web application. We processed the entire free-text portion from three different hospital datasets (King's College Hospital [KCH], South London and Maudsley [SLaM], and the US Medical Information Mart for Intensive Care III [MIMIC-III]), resulting in information from 811 336 patients and covering both physical and mental health institutions. We measured the performance of models using custom metrics derived from precision and recall. FINDINGS Foresight achieved a precision@10 (ie, of 10 forecasted candidates, at least one is correct) of 0·68 (SD 0·0027) for the KCH dataset, 0·76 (0·0032) for the SLaM dataset, and 0·88 (0·0018) for the MIMIC-III dataset, for forecasting the next new disorder in a patient timeline. Foresight also achieved a precision@10 value of 0·80 (0·0013) for the KCH dataset, 0·81 (0·0026) for the SLaM dataset, and 0·91 (0·0011) for the MIMIC-III dataset, for forecasting the next new biomedical concept. In addition, Foresight was validated on 34 synthetic patient timelines by five clinicians and achieved a relevancy of 33 (97% [95% CI 91-100]) of 34 for the top forecasted candidate disorder. As a generative model, Foresight can forecast follow-on biomedical concepts for as many steps as required. INTERPRETATION Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk forecasting, virtual trials, and clinical research to study the progression of disorders, to simulate interventions and counterfactuals, and for educational purposes. FUNDING National Health Service Artificial Intelligence Laboratory, National Institute for Health and Care Research Biomedical Research Centre, and Health Data Research UK.
Collapse
|
11
|
Hall M, Smith L, Wu J, Hayward C, Batty JA, Lambert PC, Hemingway H, Gale CP. Health outcomes after myocardial infarction: A population study of 56 million people in England. PLoS Med 2024; 21:e1004343. [PMID: 38358949 PMCID: PMC10868847 DOI: 10.1371/journal.pmed.1004343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 01/05/2024] [Indexed: 02/17/2024] Open
Abstract
BACKGROUND The occurrence of a range of health outcomes following myocardial infarction (MI) is unknown. Therefore, this study aimed to determine the long-term risk of major health outcomes following MI and generate sociodemographic stratified risk charts in order to inform care recommendations in the post-MI period and underpin shared decision making. METHODS AND FINDINGS This nationwide cohort study includes all individuals aged ≥18 years admitted to one of 229 National Health Service (NHS) Trusts in England between 1 January 2008 and 31 January 2017 (final follow-up 27 March 2017). We analysed 11 non-fatal health outcomes (subsequent MI and first hospitalisation for heart failure, atrial fibrillation, cerebrovascular disease, peripheral arterial disease, severe bleeding, renal failure, diabetes mellitus, dementia, depression, and cancer) and all-cause mortality. Of the 55,619,430 population of England, 34,116,257 individuals contributing to 145,912,852 hospitalisations were included (mean age 41.7 years (standard deviation [SD 26.1]); n = 14,747,198 (44.2%) male). There were 433,361 individuals with MI (mean age 67.4 years [SD 14.4)]; n = 283,742 (65.5%) male). Following MI, all-cause mortality was the most frequent event (adjusted cumulative incidence at 9 years 37.8% (95% confidence interval [CI] [37.6,37.9]), followed by heart failure (29.6%; 95% CI [29.4,29.7]), renal failure (27.2%; 95% CI [27.0,27.4]), atrial fibrillation (22.3%; 95% CI [22.2,22.5]), severe bleeding (19.0%; 95% CI [18.8,19.1]), diabetes (17.0%; 95% CI [16.9,17.1]), cancer (13.5%; 95% CI [13.3,13.6]), cerebrovascular disease (12.5%; 95% CI [12.4,12.7]), depression (8.9%; 95% CI [8.7,9.0]), dementia (7.8%; 95% CI [7.7,7.9]), subsequent MI (7.1%; 95% CI [7.0,7.2]), and peripheral arterial disease (6.5%; 95% CI [6.4,6.6]). Compared with a risk-set matched population of 2,001,310 individuals, first hospitalisation of all non-fatal health outcomes were increased after MI, except for dementia (adjusted hazard ratio [aHR] 1.01; 95% CI [0.99,1.02];p = 0.468) and cancer (aHR 0.56; 95% CI [0.56,0.57];p < 0.001). The study includes data from secondary care only-as such diagnoses made outside of secondary care may have been missed leading to the potential underestimation of the total burden of disease following MI. CONCLUSIONS In this study, up to a third of patients with MI developed heart failure or renal failure, 7% had another MI, and 38% died within 9 years (compared with 35% deaths among matched individuals). The incidence of all health outcomes, except dementia and cancer, was higher than expected during the normal life course without MI following adjustment for age, sex, year, and socioeconomic deprivation. Efforts targeted to prevent or limit the accrual of chronic, multisystem disease states following MI are needed and should be guided by the demographic-specific risk charts derived in this study.
Collapse
|
12
|
Katsoulis M, Lai AG, Kipourou DK, Gomes M, Banerjee A, Denaxas S, Lumbers RT, Tsilidis K, Kostara M, Belot A, Dale C, Sofat R, Leyrat C, Hemingway H, Diaz-Ordaz K. On the estimation of the effect of weight change on a health outcome using observational data, by utilising the target trial emulation framework. Int J Obes (Lond) 2023; 47:1309-1317. [PMID: 37884665 PMCID: PMC10663146 DOI: 10.1038/s41366-023-01396-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 09/17/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023]
Abstract
BACKGROUND/OBJECTIVES When studying the effect of weight change between two time points on a health outcome using observational data, two main problems arise initially (i) 'when is time zero?' and (ii) 'which confounders should we account for?' From the baseline date or the 1st follow-up (when the weight change can be measured)? Different methods have been previously used in the literature that carry different sources of bias and hence produce different results. METHODS We utilised the target trial emulation framework and considered weight change as a hypothetical intervention. First, we used a simplified example from a hypothetical randomised trial where no modelling is required. Then we simulated data from an observational study where modelling is needed. We demonstrate the problems of each of these methods and suggest a strategy. INTERVENTIONS weight loss/gain vs maintenance. RESULTS The recommended method defines time-zero at enrolment, but adjustment for confounders (or exclusion of individuals based on levels of confounders) should be performed both at enrolment and the 1st follow-up. CONCLUSIONS The implementation of our suggested method [adjusting for (or excluding based on) confounders measured both at baseline and the 1st follow-up] can help researchers attenuate bias by avoiding some common pitfalls. Other methods that have been widely used in the past to estimate the effect of weight change on a health outcome are more biased. However, two issues remain (i) the exposure is not well-defined as there are different ways of changing weight (however we tried to reduce this problem by excluding individuals who develop a chronic disease); and (ii) immortal time bias, which may be small if the time to first follow up is short.
Collapse
|
13
|
Prugger C, Perier MC, Gonzalez-Izquierdo A, Hemingway H, Denaxas S, Empana JP. Incidence of 12 common cardiovascular diseases and subsequent mortality risk in the general population. Eur J Prev Cardiol 2023; 30:1715-1722. [PMID: 37294923 DOI: 10.1093/eurjpc/zwad192] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 05/25/2023] [Accepted: 06/03/2023] [Indexed: 06/11/2023]
Abstract
BACKGROUND Incident events of cardiovascular diseases (CVDs) are heterogenous and may result in different mortality risks. Such evidence may help inform patient and physician decisions in CVD prevention and risk factor management. AIMS This study aimed to determine the extent to which incident events of common CVD show heterogeneous associations with subsequent mortality risk in the general population. METHODS AND RESULTS Based on England-wide linked electronic health records, we established a cohort of 1 310 518 people ≥30 years of age initially free of CVD and followed up for non-fatal events of 12 common CVD and cause-specific mortality. The 12 CVDs were considered as time-varying exposures in Cox's proportional hazards models to estimate hazard rate ratios (HRRs) with 95% confidence intervals (CIs). Over the median follow-up of 4.2 years (2010-16), 81 516 non-fatal CVD, 10 906 cardiovascular deaths, and 40 843 non-cardiovascular deaths occurred. All 12 CVDs were associated with increased risk of cardiovascular mortality, with HRR (95% CI) ranging from 1.67 (1.47-1.89) for stable angina to 7.85 (6.62-9.31) for haemorrhagic stroke. All 12 CVDs were also associated with increased non-cardiovascular and all-cause mortality risk but to a lesser extent: HRR (95% CI) ranged from 1.10 (1.00-1.22) to 4.55 (4.03-5.13) and from 1.24 (1.13-1.35) to 4.92 (4.44-5.46) for transient ischaemic attack and sudden cardiac arrest, respectively. CONCLUSION Incident events of 12 common CVD show significant adverse and markedly differential associations with subsequent cardiovascular, non-cardiovascular, and all-cause mortality risk in the general population.
Collapse
|
14
|
Hingorani AD, Gratton J, Finan C, Schmidt AF, Patel R, Sofat R, Kuan V, Langenberg C, Hemingway H, Morris JK, Wald NJ. Performance of polygenic risk scores in screening, prediction, and risk stratification: secondary analysis of data in the Polygenic Score Catalog. BMJ MEDICINE 2023; 2:e000554. [PMID: 37859783 PMCID: PMC10582890 DOI: 10.1136/bmjmed-2023-000554] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 08/31/2023] [Indexed: 10/21/2023]
Abstract
Objective To clarify the performance of polygenic risk scores in population screening, individual risk prediction, and population risk stratification. Design Secondary analysis of data in the Polygenic Score Catalog. Setting Polygenic Score Catalog, April 2022. Secondary analysis of 3915 performance metric estimates for 926 polygenic risk scores for 310 diseases to generate estimates of performance in population screening, individual risk, and population risk stratification. Participants Individuals contributing to the published studies in the Polygenic Score Catalog. Main outcome measures Detection rate for a 5% false positive rate (DR5) and the population odds of becoming affected given a positive result; individual odds of becoming affected for a person with a particular polygenic score; and odds of becoming affected for groups of individuals in different portions of a polygenic risk score distribution. Coronary artery disease and breast cancer were used as illustrative examples. Results For performance in population screening, median DR5 for all polygenic risk scores and all diseases studied was 11% (interquartile range 8-18%). Median DR5 was 12% (9-19%) for polygenic risk scores for coronary artery disease and 10% (9-12%) for breast cancer. The population odds of becoming affected given a positive results were 1:8 for coronary artery disease and 1:21 for breast cancer, with background 10 year odds of 1:19 and 1:41, respectively, which are typical for these diseases at age 50. For individual risk prediction, the corresponding 10 year odds of becoming affected for individuals aged 50 with a polygenic risk score at the 2.5th, 25th, 75th, and 97.5th centiles were 1:54, 1:29, 1:15, and 1:8 for coronary artery disease and 1:91, 1:56, 1:34, and 1:21 for breast cancer. In terms of population risk stratification, at age 50, the risk of coronary artery disease was divided into five groups, with 10 year odds of 1:41 and 1:11 for the lowest and highest quintile groups, respectively. The 10 year odds was 1:7 for the upper 2.5% of the polygenic risk score distribution for coronary artery disease, a group that contributed 7% of cases. The corresponding estimates for breast cancer were 1:72 and 1:26 for the lowest and highest quintile groups, and 1:19 for the upper 2.5% of the distribution, which contributed 6% of cases. Conclusion Polygenic risk scores performed poorly in population screening, individual risk prediction, and population risk stratification. Strong claims about the effect of polygenic risk scores on healthcare seem to be disproportionate to their performance.
Collapse
|
15
|
Hartmann S, Yasmeen S, Jacobs BM, Denaxas S, Pirmohamed M, Gamazon ER, Caulfield MJ, Hemingway H, Pietzner M, Langenberg C. ADRA2A and IRX1 are putative risk genes for Raynaud's phenomenon. Nat Commun 2023; 14:6156. [PMID: 37828025 PMCID: PMC10570309 DOI: 10.1038/s41467-023-41876-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 09/21/2023] [Indexed: 10/14/2023] Open
Abstract
Raynaud's phenomenon (RP) is a common vasospastic disorder that causes severe pain and ulcers, but despite its high reported heritability, no causal genes have been robustly identified. We conducted a genome-wide association study including 5,147 RP cases and 439,294 controls, based on diagnoses from electronic health records, and identified three unreported genomic regions associated with the risk of RP (p < 5 × 10-8). We prioritized ADRA2A (rs7090046, odds ratio (OR) per allele: 1.26; 95%-CI: 1.20-1.31; p < 9.6 × 10-27) and IRX1 (rs12653958, OR: 1.17; 95%-CI: 1.12-1.22, p < 4.8 × 10-13) as candidate causal genes through integration of gene expression in disease relevant tissues. We further identified a likely causal detrimental effect of low fasting glucose levels on RP risk (rG = -0.21; p-value = 2.3 × 10-3), and systematically highlighted drug repurposing opportunities, like the antidepressant mirtazapine. Our results provide the first robust evidence for a strong genetic contribution to RP and highlight a so far underrated role of α2A-adrenoreceptor signalling, encoded at ADRA2A, as a possible mechanism for hypersensitivity to catecholamine-induced vasospasms.
Collapse
|
16
|
Jordan KP, Rathod-Mistry T, van der Windt DA, Bailey J, Chen Y, Clarson L, Denaxas S, Hayward RA, Hemingway H, Kyriacou T, Mamas MA. Determining cardiovascular risk in patients with unattributed chest pain in UK primary care: an electronic health record study. Eur J Prev Cardiol 2023; 30:1151-1161. [PMID: 36895179 PMCID: PMC10442054 DOI: 10.1093/eurjpc/zwad055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 12/06/2022] [Accepted: 02/21/2023] [Indexed: 03/11/2023]
Abstract
AIMS Most adults presenting in primary care with chest pain symptoms will not receive a diagnosis ('unattributed' chest pain) but are at increased risk of cardiovascular events. To assess within patients with unattributed chest pain, risk factors for cardiovascular events and whether those at greatest risk of cardiovascular disease can be ascertained by an existing general population risk prediction model or by development of a new model. METHODS AND RESULTS The study used UK primary care electronic health records from the Clinical Practice Research Datalink linked to admitted hospitalizations. Study population was patients aged 18 plus with recorded unattributed chest pain 2002-2018. Cardiovascular risk prediction models were developed with external validation and comparison of performance to QRISK3, a general population risk prediction model. There were 374 917 patients with unattributed chest pain in the development data set. The strongest risk factors for cardiovascular disease included diabetes, atrial fibrillation, and hypertension. Risk was increased in males, patients of Asian ethnicity, those in more deprived areas, obese patients, and smokers. The final developed model had good predictive performance (external validation c-statistic 0.81, calibration slope 1.02). A model using a subset of key risk factors for cardiovascular disease gave nearly identical performance. QRISK3 underestimated cardiovascular risk. CONCLUSION Patients presenting with unattributed chest pain are at increased risk of cardiovascular events. It is feasible to accurately estimate individual risk using routinely recorded information in the primary care record, focusing on a small number of risk factors. Patients at highest risk could be targeted for preventative measures.
Collapse
|
17
|
Banerjee A, Dashtban A, Chen S, Pasea L, Thygesen JH, Fatemifar G, Tyl B, Dyszynski T, Asselbergs FW, Lund LH, Lumbers T, Denaxas S, Hemingway H. Identifying subtypes of heart failure from three electronic health record sources with machine learning: an external, prognostic, and genetic validation study. Lancet Digit Health 2023; 5:e370-e379. [PMID: 37236697 DOI: 10.1016/s2589-7500(23)00065-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 03/01/2023] [Accepted: 03/16/2023] [Indexed: 05/28/2023]
Abstract
BACKGROUND Machine learning has been used to analyse heart failure subtypes, but not across large, distinct, population-based datasets, across the whole spectrum of causes and presentations, or with clinical and non-clinical validation by different machine learning methods. Using our published framework, we aimed to discover heart failure subtypes and validate them upon population representative data. METHODS In this external, prognostic, and genetic validation study we analysed individuals aged 30 years or older with incident heart failure from two population-based databases in the UK (Clinical Practice Research Datalink [CPRD] and The Health Improvement Network [THIN]) from 1998 to 2018. Pre-heart failure and post-heart failure factors (n=645) included demographic information, history, examination, blood laboratory values, and medications. We identified subtypes using four unsupervised machine learning methods (K-means, hierarchical, K-Medoids, and mixture model clustering) with 87 of 645 factors in each dataset. We evaluated subtypes for (1) external validity (across datasets); (2) prognostic validity (predictive accuracy for 1-year mortality); and (3) genetic validity (UK Biobank), association with polygenic risk score (PRS) for heart failure-related traits (n=11), and single nucleotide polymorphisms (n=12). FINDINGS We included 188 800, 124 262, and 9573 individuals with incident heart failure from CPRD, THIN, and UK Biobank, respectively, between Jan 1, 1998, and Jan 1, 2018. After identifying five clusters, we labelled heart failure subtypes as (1) early onset, (2) late onset, (3) atrial fibrillation related, (4) metabolic, and (5) cardiometabolic. In the external validity analysis, subtypes were similar across datasets (c-statistics: THIN model in CPRD ranged from 0·79 [subtype 3] to 0·94 [subtype 1], and CPRD model in THIN ranged from 0·79 [subtype 1] to 0·92 [subtypes 2 and 5]). In the prognostic validity analysis, 1-year all-cause mortality after heart failure diagnosis (subtype 1 0·20 [95% CI 0·14-0·25], subtype 2 0·46 [0·43-0·49], subtype 3 0·61 [0·57-0·64], subtype 4 0·11 [0·07-0·16], and subtype 5 0·37 [0·32-0·41]) differed across subtypes in CPRD and THIN data, as did risk of non-fatal cardiovascular diseases and all-cause hospitalisation. In the genetic validity analysis the atrial fibrillation-related subtype showed associations with the related PRS. Late onset and cardiometabolic subtypes were the most similar and strongly associated with PRS for hypertension, myocardial infarction, and obesity (p<0·0009). We developed a prototype app for routine clinical use, which could enable evaluation of effectiveness and cost-effectiveness. INTERPRETATION Across four methods and three datasets, including genetic data, in the largest study of incident heart failure to date, we identified five machine learning-informed subtypes, which might inform aetiological research, clinical risk prediction, and the design of heart failure trials. FUNDING European Union Innovative Medicines Initiative-2.
Collapse
|
18
|
Dashtban A, Mizani MA, Pasea L, Denaxas S, Corbett R, Mamza JB, Gao H, Morris T, Hemingway H, Banerjee A. Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals. EBioMedicine 2023; 89:104489. [PMID: 36857859 PMCID: PMC9989643 DOI: 10.1016/j.ebiom.2023.104489] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 01/31/2023] [Accepted: 02/06/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions. METHODS We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter). FINDINGS After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%). MEDICATIONS Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD. INTERPRETATION In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction. FUNDING AstraZeneca UK Ltd, Health Data Research UK.
Collapse
|
19
|
Mueller SH, Lai AG, Valkovskaya M, Michailidou K, Bolla MK, Wang Q, Dennis J, Lush M, Abu-Ful Z, Ahearn TU, Andrulis IL, Anton-Culver H, Antonenkova NN, Arndt V, Aronson KJ, Augustinsson A, Baert T, Freeman LEB, Beckmann MW, Behrens S, Benitez J, Bermisheva M, Blomqvist C, Bogdanova NV, Bojesen SE, Bonanni B, Brenner H, Brucker SY, Buys SS, Castelao JE, Chan TL, Chang-Claude J, Chanock SJ, Choi JY, Chung WK, Colonna SV, Cornelissen S, Couch FJ, Czene K, Daly MB, Devilee P, Dörk T, Dossus L, Dwek M, Eccles DM, Ekici AB, Eliassen AH, Engel C, Evans DG, Fasching PA, Fletcher O, Flyger H, Gago-Dominguez M, Gao YT, García-Closas M, García-Sáenz JA, Genkinger J, Gentry-Maharaj A, Grassmann F, Guénel P, Gündert M, Haeberle L, Hahnen E, Haiman CA, Håkansson N, Hall P, Harkness EF, Harrington PA, Hartikainen JM, Hartman M, Hein A, Ho WK, Hooning MJ, Hoppe R, Hopper JL, Houlston RS, Howell A, Hunter DJ, Huo D, Ito H, Iwasaki M, Jakubowska A, Janni W, John EM, Jones ME, Jung A, Kaaks R, Kang D, Khusnutdinova EK, Kim SW, Kitahara CM, Koutros S, Kraft P, Kristensen VN, Kubelka-Sabit K, Kurian AW, Kwong A, Lacey JV, Lambrechts D, Le Marchand L, Li J, Linet M, Lo WY, Long J, Lophatananon A, Mannermaa A, Manoochehri M, Margolin S, Matsuo K, Mavroudis D, Menon U, Muir K, Murphy RA, Nevanlinna H, Newman WG, Niederacher D, O'Brien KM, Obi N, Offit K, Olopade OI, Olshan AF, Olsson H, Park SK, Patel AV, Patel A, Perou CM, Peto J, Pharoah PDP, Plaseska-Karanfilska D, Presneau N, Rack B, Radice P, Ramachandran D, Rashid MU, Rennert G, Romero A, Ruddy KJ, Ruebner M, Saloustros E, Sandler DP, Sawyer EJ, Schmidt MK, Schmutzler RK, Schneider MO, Scott C, Shah M, Sharma P, Shen CY, Shu XO, Simard J, Surowy H, Tamimi RM, Tapper WJ, Taylor JA, Teo SH, Teras LR, Toland AE, Tollenaar RAEM, Torres D, Torres-Mejía G, Troester MA, Truong T, Vachon CM, Vijai J, Weinberg CR, Wendt C, Winqvist R, Wolk A, Wu AH, Yamaji T, Yang XR, Yu JC, Zheng W, Ziogas A, Ziv E, Dunning AM, Easton DF, Hemingway H, Hamann U, Kuchenbaecker KB. Aggregation tests identify new gene associations with breast cancer in populations with diverse ancestry. Genome Med 2023; 15:7. [PMID: 36703164 PMCID: PMC9878779 DOI: 10.1186/s13073-022-01152-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 12/16/2022] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Low-frequency variants play an important role in breast cancer (BC) susceptibility. Gene-based methods can increase power by combining multiple variants in the same gene and help identify target genes. METHODS We evaluated the potential of gene-based aggregation in the Breast Cancer Association Consortium cohorts including 83,471 cases and 59,199 controls. Low-frequency variants were aggregated for individual genes' coding and regulatory regions. Association results in European ancestry samples were compared to single-marker association results in the same cohort. Gene-based associations were also combined in meta-analysis across individuals with European, Asian, African, and Latin American and Hispanic ancestry. RESULTS In European ancestry samples, 14 genes were significantly associated (q < 0.05) with BC. Of those, two genes, FMNL3 (P = 6.11 × 10-6) and AC058822.1 (P = 1.47 × 10-4), represent new associations. High FMNL3 expression has previously been linked to poor prognosis in several other cancers. Meta-analysis of samples with diverse ancestry discovered further associations including established candidate genes ESR1 and CBLB. Furthermore, literature review and database query found further support for a biologically plausible link with cancer for genes CBLB, FMNL3, FGFR2, LSP1, MAP3K1, and SRGAP2C. CONCLUSIONS Using extended gene-based aggregation tests including coding and regulatory variation, we report identification of plausible target genes for previously identified single-marker associations with BC as well as the discovery of novel genes implicated in BC development. Including multi ancestral cohorts in this study enabled the identification of otherwise missed disease associations as ESR1 (P = 1.31 × 10-5), demonstrating the importance of diversifying study cohorts.
Collapse
|
20
|
Kuan V, Denaxas S, Patalay P, Nitsch D, Mathur R, Gonzalez-Izquierdo A, Sofat R, Partridge L, Roberts A, Wong ICK, Hingorani M, Chaturvedi N, Hemingway H, Hingorani AD. Identifying and visualising multimorbidity and comorbidity patterns in patients in the English National Health Service: a population-based study. Lancet Digit Health 2023; 5:e16-e27. [PMID: 36460578 DOI: 10.1016/s2589-7500(22)00187-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 09/10/2022] [Accepted: 09/19/2022] [Indexed: 12/03/2022]
Abstract
BACKGROUND Globally, there is a paucity of multimorbidity and comorbidity data, especially for minority ethnic groups and younger people. We estimated the frequency of common disease combinations and identified non-random disease associations for all ages in a multiethnic population. METHODS In this population-based study, we examined multimorbidity and comorbidity patterns stratified by ethnicity or race, sex, and age for 308 health conditions using electronic health records from individuals included on the Clinical Practice Research Datalink linked with the Hospital Episode Statistics admitted patient care dataset in England. We included individuals who were older than 1 year and who had been registered for at least 1 year in a participating general practice during the study period (between April 1, 2010, and March 31, 2015). We identified the most common combinations of conditions and comorbidities for index conditions. We defined comorbidity as the accumulation of additional conditions to an index condition over an individual's lifetime. We used network analysis to identify conditions that co-occurred more often than expected by chance. We developed online interactive tools to explore multimorbidity and comorbidity patterns overall and by subgroup based on ethnicity, sex, and age. FINDINGS We collected data for 3 872 451 eligible patients, of whom 1 955 700 (50·5%) were women and girls, 1 916 751 (49·5%) were men and boys, 2 666 234 (68·9%) were White, 155 435 (4·0%) were south Asian, and 98 815 (2·6%) were Black. We found that a higher proportion of boys aged 1-9 years (132 506 [47·8%] of 277 158) had two or more diagnosed conditions than did girls in the same age group (106 982 [40·3%] of 265 179), but more women and girls were diagnosed with multimorbidity than were boys aged 10 years and older and men (1 361 232 [80·5%] of 1 690 521 vs 1 161 308 [70·8%] of 1 639 593). White individuals (2 097 536 [78·7%] of 2 666 234) were more likely to be diagnosed with two or more conditions than were Black (59 339 [60·1%] of 98 815) or south Asian individuals (93 617 [60·2%] of 155 435). Depression commonly co-occurred with anxiety, migraine, obesity, atopic conditions, deafness, soft-tissue disorders, and gastrointestinal disorders across all subgroups. Heart failure often co-occurred with hypertension, atrial fibrillation, osteoarthritis, stable angina, myocardial infarction, chronic kidney disease, type 2 diabetes, and chronic obstructive pulmonary disease. Spinal fractures were most strongly non-randomly associated with malignancy in Black individuals, but with osteoporosis in White individuals. Hypertension was most strongly associated with kidney disorders in those aged 20-29 years, but with dyslipidaemia, obesity, and type 2 diabetes in individuals aged 40 years and older. Breast cancer was associated with different comorbidities in individuals from different ethnic groups. Asthma was associated with different comorbidities between males and females. Bipolar disorder was associated with different comorbidities in younger age groups compared with older age groups. INTERPRETATION Our findings and interactive online tools are a resource for: patients and their clinicians, to prevent and detect comorbid conditions; research funders and policy makers, to redesign service provision, training priorities, and guideline development; and biomedical researchers and manufacturers of medicines, to provide leads for research into common or sequential pathways of disease and inform the design of clinical trials. FUNDING UK Research and Innovation, Medical Research Council, National Institute for Health and Care Research, Department of Health and Social Care, Wellcome Trust, British Heart Foundation, and The Alan Turing Institute.
Collapse
|
21
|
Mizani MA, Dashtban A, Pasea L, Lai AG, Thygesen J, Tomlinson C, Handy A, Mamza JB, Morris T, Khalid S, Zaccardi F, Macleod MJ, Torabi F, Canoy D, Akbari A, Berry C, Bolton T, Nolan J, Khunti K, Denaxas S, Hemingway H, Sudlow C, Banerjee A. Using national electronic health records for pandemic preparedness: validation of a parsimonious model for predicting excess deaths among those with COVID-19-a data-driven retrospective cohort study. J R Soc Med 2023; 116:10-20. [PMID: 36374585 PMCID: PMC9909113 DOI: 10.1177/01410768221131897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 09/24/2022] [Indexed: 11/16/2022] Open
Abstract
OBJECTIVES To use national, pre- and post-pandemic electronic health records (EHR) to develop and validate a scenario-based model incorporating baseline mortality risk, infection rate (IR) and relative risk (RR) of death for prediction of excess deaths. DESIGN An EHR-based, retrospective cohort study. SETTING Linked EHR in Clinical Practice Research Datalink (CPRD); and linked EHR and COVID-19 data in England provided in NHS Digital Trusted Research Environment (TRE). PARTICIPANTS In the development (CPRD) and validation (TRE) cohorts, we included 3.8 million and 35.1 million individuals aged ≥30 years, respectively. MAIN OUTCOME MEASURES One-year all-cause excess deaths related to COVID-19 from March 2020 to March 2021. RESULTS From 1 March 2020 to 1 March 2021, there were 127,020 observed excess deaths. Observed RR was 4.34% (95% CI, 4.31-4.38) and IR was 6.27% (95% CI, 6.26-6.28). In the validation cohort, predicted one-year excess deaths were 100,338 compared with the observed 127,020 deaths with a ratio of predicted to observed excess deaths of 0.79. CONCLUSIONS We show that a simple, parsimonious model incorporating baseline mortality risk, one-year IR and RR of the pandemic can be used for scenario-based prediction of excess deaths in the early stages of a pandemic. Our analyses show that EHR could inform pandemic planning and surveillance, despite limited use in emergency preparedness to date. Although infection dynamics are important in the prediction of mortality, future models should take greater account of underlying conditions.
Collapse
|
22
|
Surendran P, Stewart ID, Au Yeung VPW, Pietzner M, Raffler J, Wörheide MA, Li C, Smith RF, Wittemans LBL, Bomba L, Menni C, Zierer J, Rossi N, Sheridan PA, Watkins NA, Mangino M, Hysi PG, Di Angelantonio E, Falchi M, Spector TD, Soranzo N, Michelotti GA, Arlt W, Lotta LA, Denaxas S, Hemingway H, Gamazon ER, Howson JMM, Wood AM, Danesh J, Wareham NJ, Kastenmüller G, Fauman EB, Suhre K, Butterworth AS, Langenberg C. Rare and common genetic determinants of metabolic individuality and their effects on human health. Nat Med 2022; 28:2321-2332. [PMID: 36357675 PMCID: PMC9671801 DOI: 10.1038/s41591-022-02046-0] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 09/16/2022] [Indexed: 11/12/2022]
Abstract
Garrod's concept of 'chemical individuality' has contributed to comprehension of the molecular origins of human diseases. Untargeted high-throughput metabolomic technologies provide an in-depth snapshot of human metabolism at scale. We studied the genetic architecture of the human plasma metabolome using 913 metabolites assayed in 19,994 individuals and identified 2,599 variant-metabolite associations (P < 1.25 × 10-11) within 330 genomic regions, with rare variants (minor allele frequency ≤ 1%) explaining 9.4% of associations. Jointly modeling metabolites in each region, we identified 423 regional, co-regulated, variant-metabolite clusters called genetically influenced metabotypes. We assigned causal genes for 62.4% of these genetically influenced metabotypes, providing new insights into fundamental metabolite physiology and clinical relevance, including metabolite-guided discovery of potential adverse drug effects (DPYD and SRD5A2). We show strong enrichment of inborn errors of metabolism-causing genes, with examples of metabolite associations and clinical phenotypes of non-pathogenic variant carriers matching characteristics of the inborn errors of metabolism. Systematic, phenotypic follow-up of metabolite-specific genetic scores revealed multiple potential etiological relationships.
Collapse
|
23
|
Kotecha D, Asselbergs FW, Achenbach S, Anker SD, Atar D, Baigent C, Banerjee A, Beger B, Brobert G, Casadei B, Ceccarelli C, Cowie MR, Crea F, Cronin M, Denaxas S, Derix A, Fitzsimons D, Fredriksson M, Gale CP, Gkoutos GV, Goettsch W, Hemingway H, Ingvar M, Jonas A, Kazmierski R, Løgstrup S, Thomas Lumbers R, Lüscher TF, McGreavy P, Piña IL, Roessig L, Steinbeisser C, Sundgren M, Tyl B, van Thiel G, van Bochove K, Vardas PE, Villanueva T, Vrana M, Weber W, Weidinger F, Windecker S, Wood A, Grobbee DE. CODE-EHR best practice framework for the use of structured electronic healthcare records in clinical research. Eur Heart J 2022; 43:3578-3588. [PMID: 36208161 PMCID: PMC9452067 DOI: 10.1093/eurheartj/ehac426] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/21/2022] [Indexed: 11/29/2022] Open
Abstract
Big data is central to new developments in global clinical science aiming to improve the lives of patients. Technological advances have led to the routine use of structured electronic healthcare records with the potential to address key gaps in clinical evidence. The covid-19 pandemic has demonstrated the potential of big data and related analytics, but also important pitfalls. Verification, validation, and data privacy, as well as the social mandate to undertake research are key challenges. The European Society of Cardiology and the BigData@Heart consortium have brought together a range of international stakeholders, including patient representatives, clinicians, scientists, regulators, journal editors and industry. We propose the CODE-EHR Minimum Standards Framework as a means to improve the design of studies, enhance transparency and develop a roadmap towards more robust and effective utilisation of healthcare data for research purposes.
Collapse
|
24
|
Kotecha D, Asselbergs FW, Achenbach S, Anker SD, Atar D, Baigent C, Banerjee A, Beger B, Brobert G, Casadei B, Ceccarelli C, Cowie MR, Crea F, Cronin M, Denaxas S, Derix A, Fitzsimons D, Fredriksson M, Gale CP, Gkoutos GV, Goettsch W, Hemingway H, Ingvar M, Jonas A, Kazmierski R, Løgstrup S, Lumbers RT, Lüscher TF, McGreavy P, Piña IL, Roessig L, Steinbeisser C, Sundgren M, Tyl B, Thiel GV, Bochove KV, Vardas PE, Villanueva T, Vrana M, Weber W, Weidinger F, Windecker S, Wood A, Grobbee DE. CODE-EHR best-practice framework for the use of structured electronic health-care records in clinical research. Lancet Digit Health 2022; 4:e757-e764. [PMID: 36050271 DOI: 10.1016/s2589-7500(22)00151-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 07/20/2022] [Indexed: 11/16/2022]
Abstract
Big data is important to new developments in global clinical science that aim to improve the lives of patients. Technological advances have led to the regular use of structured electronic health-care records with the potential to address key deficits in clinical evidence that could improve patient care. The COVID-19 pandemic has shown this potential in big data and related analytics but has also revealed important limitations. Data verification, data validation, data privacy, and a mandate from the public to conduct research are important challenges to effective use of routine health-care data. The European Society of Cardiology and the BigData@Heart consortium have brought together a range of international stakeholders, including representation from patients, clinicians, scientists, regulators, journal editors, and industry members. In this Review, we propose the CODE-EHR minimum standards framework to be used by researchers and clinicians to improve the design of studies and enhance transparency of study methods. The CODE-EHR framework aims to develop robust and effective utilisation of health-care data for research purposes.
Collapse
|
25
|
Benedetto U, Sinha S, Mulla A, Glampson B, Davies J, Panoulas V, Gautama S, Papadimitriou D, Woods K, Elliott P, Hemingway H, Williams B, Asselbergs FW, Melikian N, Krasopoulos G, Sayeed R, Wendler O, Baig K, Chukwuemeka A, Angelini GD, Sterne JAC, Johnson T, Shah AM, Perera D, Patel RS, Kharbanda R, Channon KM, Mayet J, Kaura A. Implications of elevated troponin on time-to-surgery in non-ST elevation myocardial infarction (NIHR Health Informatics Collaborative: TROP-CABG study). Int J Cardiol 2022; 362:14-19. [PMID: 35487318 DOI: 10.1016/j.ijcard.2022.04.067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 04/18/2022] [Accepted: 04/25/2022] [Indexed: 11/05/2022]
Abstract
Implications of elevated troponin on time-to-surgery in non-ST elevation myocardial infarction(NIHR Health Informatics Collaborative:TROP-CABG study). Benedetto et al. BACKGROUND: The optimal timing of coronary artery bypass grafting (CABG) in patients with non-ST elevation myocardial infarction (NSTEMI) and the utility of pre-operative troponin levels in decision-making remains unclear. We investigated (a) the association between peak pre-operative troponin and survival post-CABG in a large cohort of NSTEMI patients and (b) the interaction between troponin and time-to-surgery. METHODS AND RESULTS: Our cohort consisted of 1746 patients (1684 NSTEMI; 62 unstable angina) (mean age 69 ± 11 years,21% female) with recorded troponins that had CABG at five United Kingdom centers between 2010 and 2017. Time-segmented Cox regression was used to investigate the interaction of peak troponin and time-to-surgery on early (within 30 days) and late (beyond 30 days) survival. Average interval from peak troponin to surgery was 9 ± 15 days, with 1466 (84.0%) patients having CABG during the same admission. Sixty patients died within 30-days and another 211 died after a mean follow-up of 4 ± 2 years (30-day survival 0.97 ± 0.004 and 5-year survival 0.83 ± 0.01). Peak troponin was a strong predictor of early survival (adjusted P = 0.002) with a significant interaction with time-to-surgery (P interaction = 0.007). For peak troponin levels <100 times the upper limit of normal, there was no improvement in early survival with longer time-to-surgery. However, in patients with higher troponins, early survival increased progressively with a longer time-to-surgery, till day 10. Peak troponin did not influence survival beyond 30 days (adjusted P = 0.64). CONCLUSIONS: Peak troponin in NSTEMI patients undergoing CABG was a significant predictor of early mortality, strongly influenced the time-to-surgery and may prove to be a clinically useful biomarker in the management of these patients.
Collapse
|