1
|
Nielsen RL, Monfeuga T, Kitchen RR, Egerod L, Leal LG, Schreyer ATH, Gade FS, Sun C, Helenius M, Simonsen L, Willert M, Tahrani AA, McVey Z, Gupta R. Data-driven identification of predictive risk biomarkers for subgroups of osteoarthritis using interpretable machine learning. Nat Commun 2024; 15:2817. [PMID: 38561399 PMCID: PMC10985086 DOI: 10.1038/s41467-024-46663-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 03/04/2024] [Indexed: 04/04/2024] Open
Abstract
Osteoarthritis (OA) is increasing in prevalence and has a severe impact on patients' lives. However, our understanding of biomarkers driving OA risk remains limited. We developed a model predicting the five-year risk of OA diagnosis, integrating retrospective clinical, lifestyle and biomarker data from the UK Biobank (19,120 patients with OA, ROC-AUC: 0.72, 95%CI (0.71-0.73)). Higher age, BMI and prescription of non-steroidal anti-inflammatory drugs contributed most to increased OA risk prediction ahead of diagnosis. We identified 14 subgroups of OA risk profiles. These subgroups were validated in an independent set of patients evaluating the 11-year OA risk, with 88% of patients being uniquely assigned to one of the 14 subgroups. Individual OA risk profiles were characterised by personalised biomarkers. Omics integration demonstrated the predictive importance of key OA genes and pathways (e.g., GDF5 and TGF-β signalling) and OA-specific biomarkers (e.g., CRTAC1 and COL9A1). In summary, this work identifies opportunities for personalised OA prevention and insights into its underlying pathogenesis.
Collapse
Affiliation(s)
| | | | | | - Line Egerod
- Novo Nordisk Research Centre Oxford, Oxford, UK
| | - Luis G Leal
- Novo Nordisk Research Centre Oxford, Oxford, UK
| | | | | | - Carol Sun
- Novo Nordisk Research Centre Oxford, Oxford, UK
| | | | | | | | | | - Zahra McVey
- Novo Nordisk Research Centre Oxford, Oxford, UK
| | | |
Collapse
|
2
|
Leal LG, David A, Jarvelin MR, Sebert S, Männikkö M, Karhunen V, Seaby E, Hoggart C, Sternberg MJE. Identification of disease-associated loci using machine learning for genotype and network data integration. Bioinformatics 2020; 35:5182-5190. [PMID: 31070705 PMCID: PMC6954643 DOI: 10.1093/bioinformatics/btz310] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 03/28/2019] [Accepted: 04/25/2019] [Indexed: 01/19/2023] Open
Abstract
Motivation Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. Results We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. Availability and implementation An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luis G Leal
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Alessia David
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Marjo-Riita Jarvelin
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.,Biocenter Oulu, University of Oulu, Oulu 90220, Finland.,Unit of Primary Health Care, Oulu University Hospital, Oulu 90220, Finland.,Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK.,Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex UB8 3PH, UK
| | - Sylvain Sebert
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.,Biocenter Oulu, University of Oulu, Oulu 90220, Finland
| | - Minna Männikkö
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland
| | - Ville Karhunen
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.,Biocenter Oulu, University of Oulu, Oulu 90220, Finland.,Unit of Primary Health Care, Oulu University Hospital, Oulu 90220, Finland.,Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK.,Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex UB8 3PH, UK
| | - Eleanor Seaby
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Clive Hoggart
- Department of Medicine, Imperial College London, London W2 1PG, UK
| | - Michael J E Sternberg
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
3
|
Leal LG, Hoggart C, Jarvelin MR, Herzig KH, Sternberg MJE, David A. A polygenic biomarker to identify patients with severe hypercholesterolemia of polygenic origin. Mol Genet Genomic Med 2020; 8:e1248. [PMID: 32307928 PMCID: PMC7284038 DOI: 10.1002/mgg3.1248] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 02/24/2020] [Accepted: 03/02/2020] [Indexed: 12/11/2022] Open
Abstract
Background Severe hypercholesterolemia (HC, LDL‐C > 4.9 mmol/L) affects over 30 million people worldwide. In this study, we validated a new polygenic risk score (PRS) for LDL‐C. Methods Summary statistics from the Global Lipid Genome Consortium and genotype data from two large populations were used. Results A 36‐SNP PRS was generated using data for 2,197 white Americans. In a replication cohort of 4,787 Finns, the PRS was strongly associated with the LDL‐C trait and explained 8% of its variability (p = 10–41). After risk categorization, the risk of having HC was higher in the high‐ versus low‐risk group (RR = 4.17, p < 1 × 10−7). Compared to a 12‐SNP LDL‐C raising score (currently used in the United Kingdom), the PRS explained more LDL‐C variability (8% vs. 6%). Among Finns with severe HC, 53% (66/124) versus 44% (55/124) were classified as high risk by the PRS and LDL‐C raising score, respectively. Moreover, 54% of individuals with severe HC defined as low risk by the LDL‐C raising score were reclassified to intermediate or high risk by the new PRS. Conclusion The new PRS has a better predictive role in identifying HC of polygenic origin compared to the currently available method and can better stratify patients into diagnostic and therapeutic algorithms.
Collapse
Affiliation(s)
- Luis G Leal
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
| | - Clive Hoggart
- Department of Medicine, Imperial College London, London, United Kingdom
| | - Marjo-Riitta Jarvelin
- Faculty of Medicine, Center for Life Course Health Research, University of Oulu, Oulu, Finland.,Biocenter Oulu, University of Oulu, Oulu, Finland.,Unit of Primary Health Care, Oulu University Hospital, Oulu, Finland.,Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, United Kingdom.,Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex, United Kingdom
| | - Karl-Heinz Herzig
- Biocenter Oulu, University of Oulu, Oulu, Finland.,Research Unit of Biomedicine, Oulu University, Oulu, Oulu University Hospital and Medical Research Center Oulu, Oulu, Finland.,Department of Gastroenterology and Metabolism, Poznan University of Medical Sciences, Poznan, Poland
| | - Michael J E Sternberg
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
| | - Alessia David
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
| |
Collapse
|
4
|
Alhuzimi E, Leal LG, Sternberg MJE, David A. Properties of human genes guided by their enrichment in rare and common variants. Hum Mutat 2017; 39:365-370. [PMID: 29197136 PMCID: PMC5838408 DOI: 10.1002/humu.23377] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Revised: 11/20/2017] [Accepted: 11/26/2017] [Indexed: 01/01/2023]
Abstract
We analyzed 563,099 common (minor allele frequency, MAF≥0.01) and rare (MAF < 0.01) genetic variants annotated in ExAC and UniProt and 26,884 disease‐causing variants from ClinVar and UniProt occurring in the coding region of 17,975 human protein‐coding genes. Three novel sets of genes were identified: those enriched in rare variants (n = 32 genes), in common variants (n = 282 genes), and in disease‐causing variants (n = 800 genes). Genes enriched in rare variants have far greater similarities in terms of biological and network properties to genes enriched in disease‐causing variants, than to genes enriched in common variants. However, in half of the genes enriched in rare variants (AOC2, MAMDC4, ANKHD1, CDC42BPB, SPAG5, TRRAP, TANC2, IQCH, USP54, SRRM2, DOPEY2, and PITPNM1), no disease‐causing variants have been identified in major, publicly available databases. Thus, genetic variants in these genes are strong candidates for disease and their identification, as part of sequencing studies, should prompt further in vitro analyses.
Collapse
Affiliation(s)
- Eman Alhuzimi
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Luis G Leal
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Alessia David
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
5
|
Garcés MF, Vallejo SA, Sanchez E, Palomino-Palomino MA, Leal LG, Ángel-Muller E, Díaz-Cruz LA, Ruíz-Parra AI, González-Clavijo AM, Castaño JP, Abba M, Lacunza E, Diéguez C, Nogueiras R, Caminos JE. Longitudinal analysis of maternal serum Follistatin concentration in normal pregnancy and preeclampsia. Clin Endocrinol (Oxf) 2015; 83:229-35. [PMID: 25565002 DOI: 10.1111/cen.12715] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Revised: 11/30/2014] [Accepted: 12/31/2014] [Indexed: 11/30/2022]
Abstract
OBJECTIVE Follistatin (FST) is a regulator of the biological activity of activin A (Act A), binding and blocking it, which could contribute to the modulation of its pro-inflammatory activity during pregnancy. We sought to investigate, in this nested case-control study, FST serum levels during normal pregnancy and correlate it with the FST profile in preeclamptic pregnant women, normal pregnant women followed 3 months postpartum and eumenorrheic nonpregnant women throughout the menstrual cycle. SUBJECTS AND METHODS Follistatin serum levels determined by ELISA, biochemical and anthropometric variables were measured in normal pregnant (n = 28) and preeclamptic (n = 20) women during three periods of gestation. In addition, FST serum levels were measured in a subset of normal pregnant women (n = 13) followed 3 months postpartum and in eumenorrheic nonpregnant women (n = 20) during the follicular and luteal phases of the menstrual cycle. RESULTS Follistatin serum levels in the eumenorrheic nonpregnant and postpartum group were significantly lower when compared to levels throughout gestation (P < 0·01). Serum FST levels increased in each period of pregnancy analysed, being significantly higher towards the end of gestation (P < 0·01). FST levels were lower in late pregnancy in preeclamptic women compared to normal pregnant women (P < 0·05). Finally, FST levels were higher in the luteal phase when compared with the follicular phase of the menstrual cycle (P < 0·05). CONCLUSIONS These analyses would permit the consideration that changes in FST levels during pregnancy contribute to the control of the Act A system.
Collapse
Affiliation(s)
- María F Garcés
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Sergio A Vallejo
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Elizabeth Sanchez
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | | | - Luis G Leal
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Edith Ángel-Muller
- Department of Obstetrics and Gynecology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Luz A Díaz-Cruz
- Department of Obstetrics and Gynecology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Ariel Iván Ruíz-Parra
- Department of Obstetrics and Gynecology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | | | - Justo P Castaño
- Department of Cell Biology, Physiology and Immunology, University of Córdoba, Reina Sofía University Hospital, Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBERobn), Spain
| | - Martin Abba
- CINIBA, Facultad de Ciencias Médicas, Universidad Nacional de La Plata, La Plata, Argentina
| | - Ezequiel Lacunza
- CINIBA, Facultad de Ciencias Médicas, Universidad Nacional de La Plata, La Plata, Argentina
| | - Carlos Diéguez
- Department of Physiology (CIMUS), School of Medicine-Instituto de Investigaciones Sanitarias (IDIS), University of Santiago de Compostela, Santiago de Compostela, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBERobn), Spain
| | - Rubén Nogueiras
- Department of Physiology (CIMUS), School of Medicine-Instituto de Investigaciones Sanitarias (IDIS), University of Santiago de Compostela, Santiago de Compostela, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBERobn), Spain
| | - Jorge E Caminos
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| |
Collapse
|
6
|
Garcés MF, Sanchez E, Cardona LF, Simanca EL, González I, Leal LG, Mora JA, Bedoya A, Alzate JP, Sánchez ÁY, Eslava-Schmalbach JH, Franco-Vega R, Parra MO, Ruíz-Parra AI, Diéguez C, Nogueiras R, Caminos JE. Maternal Serum Meteorin Levels and the Risk of Preeclampsia. PLoS One 2015; 10:e0131013. [PMID: 26121675 PMCID: PMC4487999 DOI: 10.1371/journal.pone.0131013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2015] [Accepted: 05/26/2015] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Meteorin (METRN) is a recently described neutrophic factor with angiogenic properties. This is a nested case-control study in a longitudinal cohort study that describes the serum profile of METRN during different periods of gestation in healthy and preeclamptic pregnant women. Moreover, we explore the possible application of METRN as a biomarker. METHODS AND FINDINGS Serum METRN was measured by ELISA in a longitudinal prospective cohort study in 37 healthy pregnant women, 16 mild preeclamptic women, and 20 healthy non-pregnant women during the menstrual cycle with the aim of assessing serum METRN levels and its correlations with other metabolic parameters. Immunostaining for METRN protein was performed in placenta. A multivariate logistic regression model was proposed and a classifier model was formulated for predicting preeclampsia in early and middle pregnancy. The performance in classification was evaluated using measures such as sensitivity, specificity, and the receiver operating characteristic (ROC) curve. In healthy pregnant women, serum METRN levels were significantly elevated in early pregnancy compared to middle and late pregnancy. METRN levels are significantly lower only in early pregnancy in preeclamptic women when compared to healthy pregnant women. Decision trees that did not include METRN levels in the first trimester had a reduced sensitivity of 56% in the detection of preeclamptic women, compared to a sensitivity of 69% when METRN was included. CONCLUSIONS The joint measurements of circulating METRN levels in the first trimester and systolic blood pressure and weight in the second trimester significantly increase the probabilities of predicting preeclampsia.
Collapse
Affiliation(s)
- María F Garcés
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Elizabeth Sanchez
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Luisa F Cardona
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Elkin L Simanca
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Iván González
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Luis G Leal
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - José A Mora
- Department of Internal Medicine, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Andrés Bedoya
- Department of Internal Medicine, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Juan P Alzate
- Institute of Clinical Investigations, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Ángel Y Sánchez
- Department of Pathology School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Javier H Eslava-Schmalbach
- Institute of Clinical Investigations, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Roberto Franco-Vega
- Department of Internal Medicine, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Mario O Parra
- Department of Obstetrics and Gynecology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Ariel I Ruíz-Parra
- Department of Obstetrics and Gynecology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Carlos Diéguez
- Department of Physiology (CIMUS), School of Medicine-Instituto de Investigaciones Sanitarias (IDIS), University of Santiago de Compostela, Santiago de Compostela, Spain; Biomedical Research Centre in Physiopathology of Obesity and Nutrition (CIBERobn), Instituto de Salud Carlos III, Madrid, Spain
| | - Rubén Nogueiras
- Department of Physiology (CIMUS), School of Medicine-Instituto de Investigaciones Sanitarias (IDIS), University of Santiago de Compostela, Santiago de Compostela, Spain; Biomedical Research Centre in Physiopathology of Obesity and Nutrition (CIBERobn), Instituto de Salud Carlos III, Madrid, Spain
| | - Jorge E Caminos
- Department of Physiology, School of Medicine, Universidad Nacional de Colombia, Bogotá, Colombia
| |
Collapse
|