1
|
Chen T, Zhang H, Mazumder R, Lin X. Fast and scalable ensemble learning method for versatile polygenic risk prediction. Proc Natl Acad Sci U S A 2024; 121:e2403210121. [PMID: 39110727 PMCID: PMC11331062 DOI: 10.1073/pnas.2403210121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 07/11/2024] [Indexed: 08/21/2024] Open
Abstract
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary-level data (ALL-Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL-Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large-scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL-Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20-fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL-Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL-Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state-of-the-art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL-Sum is available as a user-friendly R software package with publicly available reference data for streamlined analysis.
Collapse
Affiliation(s)
- Tony Chen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA02215
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD20814
| | - Rahul Mazumder
- Operations Research and Statistics Group, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA02215
- Department of Statistics, Harvard University, Cambridge, MA02138
| |
Collapse
|
2
|
Kaur A, Best NB, Hartwig T, Budka J, Khangura RS, McKenzie S, Aragón-Raygoza A, Strable J, Schulz B, Dilkes BP. A maize semi-dwarf mutant reveals a GRAS transcription factor involved in brassinosteroid signaling. PLANT PHYSIOLOGY 2024; 195:3072-3096. [PMID: 38709680 PMCID: PMC11288745 DOI: 10.1093/plphys/kiae147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 01/18/2024] [Accepted: 01/18/2024] [Indexed: 05/08/2024]
Abstract
Brassinosteroids (BR) and gibberellins (GA) regulate plant height and leaf angle in maize (Zea mays). Mutants with defects in BR or GA biosynthesis or signaling identify components of these pathways and enhance our knowledge about plant growth and development. In this study, we characterized three recessive mutant alleles of GRAS transcription factor 42 (gras42) in maize, a GRAS transcription factor gene orthologous to the DWARF AND LOW TILLERING (DLT) gene of rice (Oryza sativa). These maize mutants exhibited semi-dwarf stature, shorter and wider leaves, and more upright leaf angle. Transcriptome analysis revealed a role for GRAS42 as a determinant of BR signaling. Analysis of the expression consequences from loss of GRAS42 in the gras42-mu1021149 mutant indicated a weak loss of BR signaling in the mutant, consistent with its previously demonstrated role in BR signaling in rice. Loss of BR signaling was also evident by the enhancement of weak BR biosynthetic mutant alleles in double mutants of nana plant1-1 and gras42-mu1021149. The gras42-mu1021149 mutant had little effect on GA-regulated gene expression, suggesting that GRAS42 is not a regulator of core GA signaling genes in maize. Single-cell expression data identified gras42 expressed among cells in the G2/M phase of the cell cycle consistent with its previously demonstrated role in cell cycle gene expression in Arabidopsis (Arabidopsis thaliana). Cis-acting natural variation controlling GRAS42 transcript accumulation was identified by expression genome-wide association study (eGWAS) in maize. Our results demonstrate a conserved role for GRAS42/SCARECROW-LIKE 28 (SCL28)/DLT in BR signaling, clarify the role of this gene in GA signaling, and suggest mechanisms of tillering and leaf angle control by BR.
Collapse
Affiliation(s)
- Amanpreet Kaur
- Department of Biochemistry, Purdue University, West Lafayette, IN 47907USA
- Center for Plant Biology, Purdue University, West Lafayette, IN 47907, USA
| | - Norman B Best
- Plant Genetics Research Unit, USDA-ARS, Columbia, MO 65211, USA
| | - Thomas Hartwig
- Institute for Molecular Physiology, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Josh Budka
- Department of Biochemistry, Purdue University, West Lafayette, IN 47907USA
- Center for Plant Biology, Purdue University, West Lafayette, IN 47907, USA
| | - Rajdeep S Khangura
- Department of Biochemistry, Purdue University, West Lafayette, IN 47907USA
- Center for Plant Biology, Purdue University, West Lafayette, IN 47907, USA
| | - Steven McKenzie
- Department of Biochemistry, Purdue University, West Lafayette, IN 47907USA
- Center for Plant Biology, Purdue University, West Lafayette, IN 47907, USA
| | - Alejandro Aragón-Raygoza
- Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, NC 27695, USA
| | - Josh Strable
- Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, NC 27695, USA
| | - Burkhard Schulz
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD 20742, USA
| | - Brian P Dilkes
- Department of Biochemistry, Purdue University, West Lafayette, IN 47907USA
- Center for Plant Biology, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
3
|
Petersen L, Christiansen G, Chatwin H, Yilmaz Z, Schendel D, Bulik C, Grove J, Brikell I, Semark B, Holde K, Abdulkadir M, Hubel C, Albiñana C, Vilhjálmsson B, Borglum A, Demontis D, Mortensen P. The role of co-occurring conditions and genetics in the associations of eating disorders with attention-deficit/hyperactivity disorder and autism spectrum disorder. RESEARCH SQUARE 2024:rs.3.rs-4236554. [PMID: 39070652 PMCID: PMC11275993 DOI: 10.21203/rs.3.rs-4236554/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Eating disorders (EDs) commonly co-occur with other psychiatric and neurodevelopmental disorders including attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorder (ASD); however, the pattern of family history and genetic overlap among them requires clarification. This study investigated the diagnostic, familial, and genetic associations of EDs with ADHD and ASD. The nationwide population-based cohort study included all individuals born in Denmark, 1981-2008, linked to their siblings and cousins. Cox regression was used to estimate associations between EDs and ADHD or ASD, and mediation analysis was used to assess the effects of intermediate mood or anxiety disorders. Polygenic scores (PGSs) were used to investigate the genetic association between anorexia nervosa (AN) and ADHD or ASD. Significantly increased risk for any ED was observed following an ADHD [hazard ratio = 1.97, 95% confidence interval = 1.75-2.22] or ASD diagnosis [2.82, 2.48-3.19]. Mediation analysis suggested that intermediate mood or anxiety disorders could account for 44-100% of the association between ADHD or ASD and ED. Individuals with a full sibling or maternal halfsibling with ASD had increased risk of AN [1.54, 1.33-1.78; 1.45, 1.08-1.94] compared to those with siblings without ASD. A positive association was found between ASD-PGS and AN risk [1.06, 1.02-1.09]. In this study, positive phenotypic associations between EDs and ADHD or ASD, mediation by mood or anxiety disorder, and a genetic association between ASD-PGS and AN were observed. These findings could guide future research in the development of new treatments that can mitigate the development of EDs among individuals with ADHD or ASD.
Collapse
|
4
|
Lassen FH, Venkatesh SS, Baya N, Hill B, Zhou W, Bloemendal A, Neale BM, Kessler BM, Whiffin N, Lindgren CM, Palmer DS. Exome-wide evidence of compound heterozygous effects across common phenotypes in the UK Biobank. CELL GENOMICS 2024; 4:100602. [PMID: 38944039 PMCID: PMC11293579 DOI: 10.1016/j.xgen.2024.100602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 03/11/2024] [Accepted: 06/07/2024] [Indexed: 07/01/2024]
Abstract
The phenotypic impact of compound heterozygous (CH) variation has not been investigated at the population scale. We phased rare variants (MAF ∼0.001%) in the UK Biobank (UKBB) exome-sequencing data to characterize recessive effects in 175,587 individuals across 311 common diseases. A total of 6.5% of individuals carry putatively damaging CH variants, 90% of which are only identifiable upon phasing rare variants (MAF < 0.38%). We identify six recessive gene-trait associations (p < 1.68 × 10-7) after accounting for relatedness, polygenicity, nearby common variants, and rare variant burden. Of these, just one is discovered when considering homozygosity alone. Using longitudinal health records, we additionally identify and replicate a novel association between bi-allelic variation in ATP2C2 and an earlier age at onset of chronic obstructive pulmonary disease (COPD) (p < 3.58 × 10-8). Genetic phase contributes to disease risk for gene-trait pairs: ATP2C2-COPD (p = 0.000238), FLG-asthma (p = 0.00205), and USH2A-visual impairment (p = 0.0084). We demonstrate the power of phasing large-scale genetic cohorts to discover phenome-wide consequences of compound heterozygosity.
Collapse
Affiliation(s)
- Frederik H Lassen
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
| | - Samvida S Venkatesh
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Nikolas Baya
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Barney Hill
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Wei Zhou
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Alex Bloemendal
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Novo Nordisk Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Benedikt M Kessler
- Target Discovery Institute, Centre for Medicines Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Nicola Whiffin
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Cecilia M Lindgren
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK; Nuffield Department of Population Health, Medical Sciences Division, University of Oxford, Oxford, UK.
| | - Duncan S Palmer
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Nuffield Department of Population Health, Medical Sciences Division, University of Oxford, Oxford, UK.
| |
Collapse
|
5
|
Li X, Fernandes BS, Liu A, Lu Y, Chen J, Zhao Z, Dai Y. GRPa-PRS: A risk stratification method to identify genetically-regulated pathways in polygenic diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.06.19.23291621. [PMID: 37425929 PMCID: PMC10327215 DOI: 10.1101/2023.06.19.23291621] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Background Polygenic risk scores (PRS) are tools used to evaluate an individual's susceptibility to polygenic diseases based on their genetic profile. A considerable proportion of people carry a high genetic risk but evade the disease. On the other hand, some individuals with a low risk of eventually developing the disease. We hypothesized that unknown counterfactors might be involved in reversing the PRS prediction, which might provide new insights into the pathogenesis, prevention, and early intervention of diseases. Methods We built a novel computational framework to identify genetically-regulated pathways (GRPas) using PRS-based stratification for each cohort. We curated two AD cohorts with genotyping data; the discovery (disc) and the replication (rep) datasets include 2722 and 2854 individuals, respectively. First, we calculated the optimized PRS model based on the three recent AD GWAS summary statistics for each cohort. Then, we stratified the individuals by their PRS and clinical diagnosis into six biologically meaningful PRS strata, such as AD cases with low/high risk and cognitively normal (CN) with low/high risk. Lastly, we imputed individual genetically-regulated expression (GReX) and identified differential GReX and GRPas between risk strata using gene-set enrichment and variational analyses in two models, with and without APOE effects. An orthogonality test was further conducted to verify those GRPas are independent of PRS risk. To verify the generalizability of other polygenic diseases, we further applied a default model of GRPa-PRS for schizophrenia (SCZ). Results For each stratum, we conducted the same procedures in both the disc and rep datasets for comparison. In AD, we identified several well-known AD-related pathways, including amyloid-beta clearance, tau protein binding, and astrocyte response to oxidative stress. Additionally, we discovered resilience-related GRPs that are orthogonal to AD PRS, such as the calcium signaling pathway and divalent inorganic cation homeostasis. In SCZ, pathways related to mitochondrial function and muscle development were highlighted. Finally, our GRPa-PRS method identified more consistent differential pathways compared to another variant-based pathway PRS method. Conclusions We developed a framework, GRPa-PRS, to systematically explore the differential GReX and GRPas among individuals stratified by their estimated PRS. The GReX-level comparison among those strata unveiled new insights into the pathways associated with disease risk and resilience. Our framework is extendable to other polygenic complex diseases.
Collapse
Affiliation(s)
- Xiaoyang Li
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Brisa S. Fernandes
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Andi Liu
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yimei Lu
- Nevada Institute of Personalized Medicine, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Jingchun Chen
- Nevada Institute of Personalized Medicine, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Zhongming Zhao
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yulin Dai
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
6
|
Buss M, Wagner J, Bleckmann E, Wieczorek LL. Popularity at first sight: Dominant behaviours mediate the link between extraversion and popularity in face-to-face and virtual group interactions. BRITISH JOURNAL OF SOCIAL PSYCHOLOGY 2024; 63:1226-1253. [PMID: 38288846 DOI: 10.1111/bjso.12720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/07/2023] [Indexed: 07/13/2024]
Abstract
Although there is robust evidence that being more extraverted is related to higher popularity, only few studies have examined which actual behaviours (e.g., verbal content, body language) might explain this association. The current study examined whether observer-rated dominant behaviours (nonverbal, paraverbal, verbal, and general cues) mediate the relationship between self-rated extraversion and its facets (assertiveness, sociability, and activity) and other-rated popularity in zero-acquaintance settings. In two studies, we analysed data from face-to-face (Study 1, N = 124) and virtual (Study 2, N = 291) group interactions where participants were videotaped while performing a task and subsequently rated each other on popularity. Across studies, extraversion and the facets assertiveness and sociability were consistently associated with higher popularity, while the role of dominant behaviours differed. In Study 1, only two nonverbal behaviours, dominant gestures and upright posture, mediated the association between extraversion and popularity. In Study 2, all four types of behavioural cues mediated the association between extraversion (facets) and popularity. We discuss how these findings provide insights into the mechanisms of attaining popularity at zero acquaintance in diverse social settings.
Collapse
Affiliation(s)
- Martje Buss
- Department of Educational Psychology and Personality Development, Institute of Psychology, University of Hamburg, Hamburg, Germany
| | - Jenny Wagner
- Department of Educational Psychology and Personality Development, Institute of Psychology, University of Hamburg, Hamburg, Germany
| | - Eva Bleckmann
- Department of Educational Psychology and Personality Development, Institute of Psychology, University of Hamburg, Hamburg, Germany
| | - Larissa L Wieczorek
- Department of Educational Psychology and Personality Development, Institute of Psychology, University of Hamburg, Hamburg, Germany
| |
Collapse
|
7
|
Ooi E, Xiang R, Chamberlain AJ, Goddard ME. Archetypal clustering reveals physiological mechanisms linking milk yield and fertility in dairy cattle. J Dairy Sci 2024; 107:4726-4742. [PMID: 38369117 DOI: 10.3168/jds.2023-23699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 01/11/2024] [Indexed: 02/20/2024]
Abstract
Fertility in dairy cattle has declined as an unintended consequence of single-trait selection for high milk yield. The unfavorable genetic correlation between milk yield and fertility is now well documented; however, the underlying physiological mechanisms are still uncertain. To understand the relationship between these traits, we developed a method that clusters variants with similar patterns of effects and, after the integration of gene expression data, identifies the genes through which they are likely to act. Biological processes that are enriched in the genes of each cluster were then identified. We identified several clusters with unique patterns of effects. One of the clusters included variants associated with increased milk yield and decreased fertility, where the "archetypal" variant (i.e., the one with the largest effect) was associated with the GC gene, whereas others were associated with TRIM32, LRRK2, and U6-associated snRNA. These genes have been linked to transcription and alternative splicing, suggesting that these processes are likely contributors to the unfavorable relationship between the 2 traits. Another cluster, with archetypal variant near DGAT1 and including variants associated with CDH2, BTRC, SFRP2, ZFHX3, and SLITRK5, appeared to affect milk yield but have little effect on fertility. These genes have been linked to insulin, adipose tissue, and energy metabolism. A third cluster with archetypal variant near ZNF613 and including variants associated with ROBO1, EFNA5, PALLD, GPC6, and PTPRT were associated with fertility but not milk yield. These genes have been linked to GnRH neuronal migration, embryonic development, or ovarian function. The use of archetypal clustering to group variants with similar patterns of effects may assist in identifying the biological processes underlying correlated traits. The method is hypothesis generating and requires experimental confirmation. However, we have uncovered several novel mechanisms potentially affecting milk production and fertility such as GnRH neuronal migration. We anticipate our method to be a starting point for experimental research into novel pathways, which have been previously unexplored within the context of dairy production.
Collapse
Affiliation(s)
- E Ooi
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, Victoria 3010, Australia; Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia.
| | - R Xiang
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, Victoria 3010, Australia; Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia
| | - A J Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia
| | - M E Goddard
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, Victoria 3010, Australia; Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia
| |
Collapse
|
8
|
Li JL, McClellan JC, Zhang H, Gao G, Huo D. Multi-tissue transcriptome-wide association studies identified 235 genes for intrinsic subtypes of breast cancer. J Natl Cancer Inst 2024; 116:1105-1115. [PMID: 38400758 PMCID: PMC11223833 DOI: 10.1093/jnci/djae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/25/2024] [Accepted: 02/20/2024] [Indexed: 02/26/2024] Open
Abstract
BACKGROUND Although genome-wide association studies (GWAS) of breast cancer (BC) identified common variants which differ between intrinsic subtypes, genes through which these variants act to impact BC risk have not been fully established. Transcriptome-wide association studies (TWAS) have identified genes associated with overall BC risk, but subtype-specific differences are largely unknown. METHODS We performed two multi-tissue TWAS for each BC intrinsic subtype, including an expression-based approach that collated TWAS signals from expression quantitative trait loci (eQTLs) across multiple tissues and a novel splicing-based approach that collated signals from splicing QTLs (sQTLs) across intron clusters and subsequently across tissues. We used summary statistics for five intrinsic subtypes including Luminal A-like, Luminal B-like, Luminal B/HER2-negative-like, HER2-enriched-like, and triple-negative BC, generated from 106 278 BC cases and 91 477 controls in the Breast Cancer Association Consortium. RESULTS Overall, we identified 235 genes in 88 loci that were associated with at least one of the five intrinsic subtypes. Most genes were subtype-specific, and many have not been reported in previous TWAS. We discovered common variants that modulate expression of CHEK2 confer increased risk to Luminal A-like BC, in contrast to the viewpoint that CHEK2 primarily harbors rare, penetrant mutations. Additionally, our splicing-based TWAS provided population-level support for MDM4 splice variants that increased the risk of triple-negative BC. CONCLUSION Our comprehensive, multi-tissue TWAS corroborated previous GWAS loci for overall BC risk and intrinsic subtypes, while underscoring how common variation that impacts expression and splicing of genes in multiple tissue types can be used to further elucidate the etiology of BC.
Collapse
Affiliation(s)
- James L Li
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Julian C McClellan
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Guimin Gao
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Dezheng Huo
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, IL, USA
| |
Collapse
|
9
|
Reynes L, Fouqueau L, Aurelle D, Mauger S, Destombe C, Valero M. Temporal genomics help in deciphering neutral and adaptive patterns in the contemporary evolution of kelp populations. J Evol Biol 2024; 37:677-692. [PMID: 38629140 DOI: 10.1093/jeb/voae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 03/25/2024] [Accepted: 04/15/2024] [Indexed: 06/30/2024]
Abstract
The impact of climate change on populations will be contingent upon their contemporary adaptive evolution. In this study, we investigated the contemporary evolution of 4 populations of the cold-water kelp Laminaria digitata by analyzing their spatial and temporal genomic variations using ddRAD-sequencing. These populations were sampled from the center to the southern margin of its north-eastern Atlantic distribution at 2 time points, spanning at least 2 generations. Through genome scans for local adaptation at a single time point, we identified candidate loci that showed clinal variation correlated with changes in sea surface temperature (SST) along latitudinal gradients. This finding suggests that SST may drive the adaptive response of these kelp populations, although factors such as species' demographic history should also be considered. Additionally, we performed a simulation approach to distinguish the effect of selection from genetic drift in allele frequency changes over time. This enabled the detection of loci in the southernmost population that exhibited temporal differentiation beyond what would be expected from genetic drift alone: these are candidate loci which could have evolved under selection over time. In contrast, we did not detect any outlier locus based on temporal differentiation in the population from the North Sea, which also displayed low and decreasing levels of genetic diversity. The diverse evolutionary scenarios observed among populations can be attributed to variations in the prevalence of selection relative to genetic drift across different environments. Therefore, our study highlights the potential of temporal genomics to offer valuable insights into the contemporary evolution of marine foundation species facing climate change.
Collapse
Affiliation(s)
- Lauric Reynes
- IRL 3614, CNRS, Sorbonne Université, Pontificia Universidad Católica de Chile, Universidad Austral de Chile, Station Biologique de Roscoff, Roscoff 29688, France
| | - Louise Fouqueau
- IRL 3614, CNRS, Sorbonne Université, Pontificia Universidad Católica de Chile, Universidad Austral de Chile, Station Biologique de Roscoff, Roscoff 29688, France
| | - Didier Aurelle
- Aix-Marseille Université, Université de Toulon, CNRS, IRD, MIO, 13288 Marseille, France
- Institut de Systématique Évolution Biodiversité (ISYEB, UMR 7205), Muséum National d'Histoire Naturelle, CNRS, EPHE, Sorbonne Université, Paris, France
| | - Stéphane Mauger
- IRL 3614, CNRS, Sorbonne Université, Pontificia Universidad Católica de Chile, Universidad Austral de Chile, Station Biologique de Roscoff, Roscoff 29688, France
| | - Christophe Destombe
- IRL 3614, CNRS, Sorbonne Université, Pontificia Universidad Católica de Chile, Universidad Austral de Chile, Station Biologique de Roscoff, Roscoff 29688, France
| | - Myriam Valero
- IRL 3614, CNRS, Sorbonne Université, Pontificia Universidad Católica de Chile, Universidad Austral de Chile, Station Biologique de Roscoff, Roscoff 29688, France
| |
Collapse
|
10
|
Gigase FAJ, Suleri A, Isaevska E, Rommel AS, Boekhorst MGBM, Dmitrichenko O, El Marroun H, Steegers EAP, Hillegers MHJ, Muetzel RL, Lieb W, Cecil CAM, Pop V, Breen M, Bergink V, de Witte LD. Inflammatory markers in pregnancy - surprisingly stable. Mapping trajectories and drivers in four large cohorts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.19.599718. [PMID: 38948713 PMCID: PMC11213028 DOI: 10.1101/2024.06.19.599718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Adaptations of the immune system throughout gestation have been proposed as important mechanisms regulating successful pregnancy. Dysregulation of the maternal immune system has been associated with adverse maternal and fetal outcomes. To translate findings from mechanistic preclinical studies to human pregnancies, studies of serum immune markers are the mainstay. The design and interpretation of human biomarker studies require additional insights in the trajectories and drivers of peripheral immune markers. The current study mapped maternal inflammatory markers (C-reactive protein (CRP), interleukin (IL)-1β, IL-6, IL-17A, IL-23, interferon- γ ) during pregnancy and investigated the impact of demographic, environmental and genetic drivers on maternal inflammatory marker levels in four multi-ethnic and socio-economically diverse population-based cohorts with more than 12,000 pregnant participants. Additionally, pregnancy inflammatory markers were compared to pre-pregnancy levels. Cytokines showed a high correlation with each other, but not with CRP. Inflammatory marker levels showed high variability between individuals, yet high concordance within an individual over time during and pre-pregnancy. Pre-pregnancy body mass index (BMI) explained more than 9.6% of the variance in CRP, but less than 1% of the variance in cytokines. The polygenic score of CRP was the best predictor of variance in CRP (>14.1%). Gestational age and previously identified inflammation drivers, including tobacco use and parity, explained less than 1% of variance in both cytokines and CRP. Our findings corroborate differential underlying regulatory mechanisms of CRP and cytokines and are suggestive of an individual inflammatory marker baseline which is, in part, genetically driven. While prior research has mainly focused on immune marker changes throughout pregnancy, our study suggests that this field could benefit from a focus on intra-individual factors, including metabolic and genetic components.
Collapse
|
11
|
Zhang J, Weissenkampen JD, Kember RL, Grove J, Børglum AD, Robinson EB, Brodkin ES, Almasy L, Bucan M, Sebro R. Phenotypic and ancestry-related assortative mating in autism. Mol Autism 2024; 15:27. [PMID: 38877467 PMCID: PMC11177537 DOI: 10.1186/s13229-024-00605-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 05/30/2024] [Indexed: 06/16/2024] Open
Abstract
BACKGROUND Positive assortative mating (AM) in several neuropsychiatric traits, including autism, has been noted. However, it is unknown whether the pattern of AM is different in phenotypically defined autism subgroups [e.g., autism with and without intellectually disability (ID)]. It is also unclear what proportion of the phenotypic AM can be explained by the genetic similarity between parents of children with an autism diagnosis, and the consequences of AM on the genetic structure of the population. METHODS To address these questions, we analyzed two family-based autism collections: the Simons Foundation Powering Autism Research for Knowledge (SPARK) (1575 families) and the Simons Simplex Collection (SSC) (2283 families). RESULTS We found a similar degree of phenotypic and ancestry-related AM in parents of children with an autism diagnosis regardless of the presence of ID. We did not find evidence of AM for autism based on autism polygenic scores (PGS) (at a threshold of |r|> 0.1). The adjustment of ancestry-related AM or autism PGS accounted for only 0.3-4% of the fractional change in the estimate of the phenotypic AM. The ancestry-related AM introduced higher long-range linkage disequilibrium (LD) between single nucleotide polymorphisms (SNPs) on different chromosomes that are highly ancestry-informative compared to SNPs that are less ancestry-informative (D2 on the order of 1 × 10-5). LIMITATIONS We only analyzed participants of European ancestry, limiting the generalizability of our results to individuals of non-European ancestry. SPARK and SSC were both multicenter studies. Therefore, there could be ancestry-related AM in SPARK and SSC due to geographic stratification. The study participants from each site were unknown, so we were unable to evaluate for geographic stratification. CONCLUSIONS This study showed similar patterns of AM in autism with and without ID, and demonstrated that the common genetic influences of autism are likely relevant to both autism groups. The adjustment of ancestry-related AM and autism PGS accounted for < 5% of the fractional change in the estimate of the phenotypic AM. Future studies are needed to evaluate if the small increase of long-range LD induced by ancestry-related AM has impact on the downstream analysis.
Collapse
Affiliation(s)
- Jing Zhang
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | | | - Rachel L Kember
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
| | - Jakob Grove
- Center for Genomics and Personalized Medicine, Aarhus University, Aarhus, Denmark
- Department of Biomedicine (Human Genetics) and iSEQ Center, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - Anders D Børglum
- Center for Genomics and Personalized Medicine, Aarhus University, Aarhus, Denmark
- Department of Biomedicine (Human Genetics) and iSEQ Center, Aarhus University, Aarhus, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - Elise B Robinson
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Edward S Brodkin
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
| | - Laura Almasy
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Maja Bucan
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
| | - Ronnie Sebro
- Department of Radiology, Mayo Clinic, Jacksonville, FL, USA.
| |
Collapse
|
12
|
Piazza GG, Allegrini AG, Eley TC, Epskamp S, Fried E, Isvoranu AM, Roiser JP, Pingault JB. Polygenic Scores and Networks of Psychopathology Symptoms. JAMA Psychiatry 2024:2819863. [PMID: 38865107 PMCID: PMC11170456 DOI: 10.1001/jamapsychiatry.2024.1403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/19/2024] [Indexed: 06/13/2024]
Abstract
Importance Studies on polygenic risk for psychiatric traits commonly use a disorder-level approach to phenotyping, implicitly considering disorders as homogeneous constructs; however, symptom heterogeneity is ubiquitous, with many possible combinations of symptoms falling under the same disorder umbrella. Focusing on individual symptoms may shed light on the role of polygenic risk in psychopathology. Objective To determine whether polygenic scores are associated with all symptoms of psychiatric disorders or with a subset of indicators and whether polygenic scores are associated with comorbid phenotypes via specific sets of relevant symptoms. Design, Setting, and Participants Data from 2 population-based cohort studies were used in this cross-sectional study. Data from children in the Avon Longitudinal Study of Parents and Children (ALSPAC) were included in the primary analysis, and data from children in the Twins Early Development Study (TEDS) were included in confirmatory analyses. Data analysis was conducted from October 2021 to January 2024. Pregnant women based in the Southwest of England due to deliver in 1991 to 1992 were recruited in ALSPAC. Twins born in 1994 to 1996 were recruited in TEDS from population-based records. Participants with available genetic data and whose mothers completed the Short Mood and Feelings Questionnaire and the Strength and Difficulties Questionnaire when children were 11 years of age were included. Main Outcomes and Measures Psychopathology relevant symptoms, such as hyperactivity, prosociality, depression, anxiety, and peer and conduct problems at age 11 years. Psychological networks were constructed including individual symptoms and polygenic scores for depression, anxiety, attention-deficit/hyperactivity disorder (ADHD), body mass index (BMI), and educational attainment in ALSPAC. Following a preregistered confirmatory analysis, network models were cross-validated in TEDS. Results Included were 5521 participants from ALSPAC (mean [SD] age, 11.8 [0.14] years; 2777 [50.3%] female) and 4625 participants from TEDS (mean [SD] age, 11.27 [0.69] years; 2460 [53.2%] female). Polygenic scores were preferentially associated with restricted subsets of core symptoms and indirectly associated with other, more distal symptoms of psychopathology (network edges ranged between r = -0.074 and r = 0.073). Psychiatric polygenic scores were associated with specific cross-disorder symptoms, and nonpsychiatric polygenic scores were associated with a variety of indicators across disorders, suggesting a potential contribution of nonpsychiatric traits to comorbidity. For example, the polygenic score for ADHD was associated with a core ADHD symptom, being easily distracted (r = 0.07), and the polygenic score for BMI was associated with symptoms across disorders, including being bullied (r = 0.053) and not thinking things out (r = 0.041). Conclusions and Relevance Genetic associations observed at the disorder level may hide symptom-level heterogeneity. A symptom-level approach may enable a better understanding of the role of polygenic risk in shaping psychopathology and comorbidity.
Collapse
Affiliation(s)
- Giulia G. Piazza
- Department of Clinical, Educational and Health Psychology, University College London, London, United Kingdom
| | - Andrea G. Allegrini
- Department of Clinical, Educational and Health Psychology, University College London, London, United Kingdom
- Social Genetic and Developmental Psychiatry, King’s College London, London, United Kingdom
| | - Thalia C. Eley
- Social Genetic and Developmental Psychiatry, King’s College London, London, United Kingdom
| | - Sacha Epskamp
- Department of Psychology, National University of Singapore, Singapore
| | - Eiko Fried
- Department of Clinical Psychology, Leiden University, Leiden, the Netherlands
| | | | - Jonathan P. Roiser
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| | - Jean-Baptiste Pingault
- Department of Clinical, Educational and Health Psychology, University College London, London, United Kingdom
- Social Genetic and Developmental Psychiatry, King’s College London, London, United Kingdom
| |
Collapse
|
13
|
Zhao H, Guo X, Wang W, Wang Z, Rawson P, Wilbur A, Hare M. Consequences of domestication in eastern oyster: Insights from whole genomic analyses. Evol Appl 2024; 17:e13710. [PMID: 38817396 PMCID: PMC11134191 DOI: 10.1111/eva.13710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/02/2024] [Accepted: 05/01/2024] [Indexed: 06/01/2024] Open
Abstract
Selective breeding for production traits has yielded relatively rapid successes with high-fecundity aquaculture species. Discovering the genetic changes associated with selection is an important goal for understanding adaptation and can also facilitate better predictions about the likely fitness of selected strains if they escape aquaculture farms. Here, we hypothesize domestication as a genetic change induced by inadvertent selection in culture. Our premise is that standardized culture protocols generate parallel domestication effects across independent strains. Using eastern oyster as a model and a newly developed 600K SNP array, this study tested for parallel domestication effects in multiple independent selection lines compared with their progenitor wild populations. A single contrast was made between pooled selected strains (1-17 generations in culture) and all wild progenitor samples combined. Population structure analysis indicated rank order levels of differentiation as [wild - wild] < [wild - cultured] < [cultured - cultured]. A genome scan for parallel adaptation to the captive environment applied two methodologically distinct outlier tests to the wild versus selected strain contrast and identified a total of 1174 candidate SNPs. Contrasting wild versus selected strains revealed the early evolutionary consequences of domestication in terms of genomic differentiation, standing genetic diversity, effective population size, relatedness, runs of homozygosity profiles, and genome-wide linkage disequilibrium patterns. Random Forest was used to identify 37 outlier SNPs that had the greatest discriminatory power between bulked wild and selected oysters. The outlier SNPs were in genes enriched for cytoskeletal functions, hinting at possible traits under inadvertent selection during larval culture or pediveliger setting at high density. This study documents rapid genomic changes stemming from hatchery-based cultivation of eastern oysters, identifies candidate loci responding to domestication in parallel among independent aquaculture strains, and provides potentially useful genomic resources for monitoring interbreeding between farm and wild oysters.
Collapse
Affiliation(s)
- Honggang Zhao
- Department of Natural Resources & the EnvironmentCornell UniversityIthacaNew YorkUSA
- Present address:
Center for Aquaculture TechnologySan DiegoCaliforniaUSA
| | - Ximing Guo
- Haskin Shellfish Research LaboratoryRutgers UniversityPort NorrisNew JerseyUSA
| | - Wenlu Wang
- Department of Computer SciencesTexas A&M University‐Corpus ChristiCorpus ChristiTexasUSA
| | - Zhenwei Wang
- Haskin Shellfish Research LaboratoryRutgers UniversityPort NorrisNew JerseyUSA
| | - Paul Rawson
- School of Marine SciencesUniversity of MaineOronoMaineUSA
| | - Ami Wilbur
- Shellfish Research Hatchery, Center for Marine ScienceUniversity of North Carolina WilmingtonWilmingtonNorth CarolinaUSA
| | - Matthew Hare
- Department of Natural Resources & the EnvironmentCornell UniversityIthacaNew YorkUSA
| |
Collapse
|
14
|
Benjamin KJM, Chen Q, Eagles NJ, Huuki-Myers LA, Collado-Torres L, Stolz JM, Pertea G, Shin JH, Paquola ACM, Hyde TM, Kleinman JE, Jaffe AE, Han S, Weinberger DR. Analysis of gene expression in the postmortem brain of neurotypical Black Americans reveals contributions of genetic ancestry. Nat Neurosci 2024; 27:1064-1074. [PMID: 38769152 PMCID: PMC11156587 DOI: 10.1038/s41593-024-01636-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 03/29/2024] [Indexed: 05/22/2024]
Abstract
Ancestral differences in genomic variation affect the regulation of gene expression; however, most gene expression studies have been limited to European ancestry samples or adjusted to identify ancestry-independent associations. Here, we instead examined the impact of genetic ancestry on gene expression and DNA methylation in the postmortem brain tissue of admixed Black American neurotypical individuals to identify ancestry-dependent and ancestry-independent contributions. Ancestry-associated differentially expressed genes (DEGs), transcripts and gene networks, while notably not implicating neurons, are enriched for genes related to the immune response and vascular tissue and explain up to 26% of heritability for ischemic stroke, 27% of heritability for Parkinson disease and 30% of heritability for Alzheimer's disease. Ancestry-associated DEGs also show general enrichment for the heritability of diverse immune-related traits but depletion for psychiatric-related traits. We also compared Black and non-Hispanic white Americans, confirming most ancestry-associated DEGs. Our results delineate the extent to which genetic ancestry affects differences in gene expression in the human brain and the implications for brain illness risk.
Collapse
Affiliation(s)
- Kynon J M Benjamin
- Lieber Institute for Brain Development, Baltimore, MD, USA.
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Qiang Chen
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | | | | | - Leonardo Collado-Torres
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Joshua M Stolz
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Geo Pertea
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Joo Heon Shin
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Apuã C M Paquola
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Thomas M Hyde
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Joel E Kleinman
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Andrew E Jaffe
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Neumora Therapeutics, Watertown, MA, USA
| | - Shizhong Han
- Lieber Institute for Brain Development, Baltimore, MD, USA.
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Daniel R Weinberger
- Lieber Institute for Brain Development, Baltimore, MD, USA.
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
15
|
Ohta R, Tanigawa Y, Suzuki Y, Kellis M, Morishita S. A polygenic score method boosted by non-additive models. Nat Commun 2024; 15:4433. [PMID: 38811555 DOI: 10.1038/s41467-024-48654-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 05/08/2024] [Indexed: 05/31/2024] Open
Abstract
Dominance heritability in complex traits has received increasing recognition. However, most polygenic score (PGS) approaches do not incorporate non-additive effects. Here, we present GenoBoost, a flexible PGS modeling framework capable of considering both additive and non-additive effects, specifically focusing on genetic dominance. Building on statistical boosting theory, we derive provably optimal GenoBoost scores and provide its efficient implementation for analyzing large-scale cohorts. We benchmark it against seven commonly used PGS methods and demonstrate its competitive predictive performance. GenoBoost is ranked the best for four traits and second-best for three traits among twelve tested disease outcomes in UK Biobank. We reveal that GenoBoost improves prediction for autoimmune diseases by incorporating non-additive effects localized in the MHC locus and, more broadly, works best in less polygenic traits. We further demonstrate that GenoBoost can infer the mode of genetic inheritance without requiring prior knowledge. For example, GenoBoost finds non-zero genetic dominance effects for 602 of 900 selected genetic variants, resulting in 2.5% improvements in predicting psoriasis cases. Lastly, we show that GenoBoost can prioritize genetic loci with genetic dominance not previously reported in the GWAS catalog. Our results highlight the increased accuracy and biological insights from incorporating non-additive effects in PGS models.
Collapse
Affiliation(s)
- Rikifumi Ohta
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
| | - Yosuke Tanigawa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
| |
Collapse
|
16
|
Wen C, Margolis M, Dai R, Zhang P, Przytycki PF, Vo DD, Bhattacharya A, Matoba N, Tang M, Jiao C, Kim M, Tsai E, Hoh C, Aygün N, Walker RL, Chatzinakos C, Clarke D, Pratt H, Peters MA, Gerstein M, Daskalakis NP, Weng Z, Jaffe AE, Kleinman JE, Hyde TM, Weinberger DR, Bray NJ, Sestan N, Geschwind DH, Roeder K, Gusev A, Pasaniuc B, Stein JL, Love MI, Pollard KS, Liu C, Gandal MJ. Cross-ancestry atlas of gene, isoform, and splicing regulation in the developing human brain. Science 2024; 384:eadh0829. [PMID: 38781368 DOI: 10.1126/science.adh0829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/07/2024] [Indexed: 05/25/2024]
Abstract
Neuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts. Gene expression heritability drops during development, likely reflecting both increasing cellular heterogeneity and the intrinsic properties of neuronal maturation. Isoform-level regulation, particularly in the second trimester, mediated the largest proportion of GWAS heritability. Through colocalization, we prioritized mechanisms for about 60% of GWAS loci across five disorders, exceeding adult brain findings. Finally, we contextualized results within gene and isoform coexpression networks, revealing the comprehensive landscape of transcriptome regulation in development and disease.
Collapse
Affiliation(s)
- Cindy Wen
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Michael Margolis
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Pan Zhang
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Pawel F Przytycki
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
| | - Daniel D Vo
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nana Matoba
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Miao Tang
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chuan Jiao
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, Team Krebs, 75014 Paris, France
| | - Minsoo Kim
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Ellen Tsai
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Celine Hoh
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rebecca L Walker
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Christos Chatzinakos
- Department of Psychiatry, Harvard Medical School, Boston, MA 02215, USA
- McLean Hospital, Belmont, MA 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Declan Clarke
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Henry Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Mette A Peters
- CNS Data Coordination Group, Sage Bionetworks, Seattle, WA 98109, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Nikolaos P Daskalakis
- Department of Psychiatry, Harvard Medical School, Boston, MA 02215, USA
- McLean Hospital, Belmont, MA 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Andrew E Jaffe
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
- Neumora Therapeutics, Watertown, MA 02472, USA
| | - Joel E Kleinman
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Thomas M Hyde
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Daniel R Weinberger
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Nicholas J Bray
- MRC Centre for Neuropsychiatric Genetics & Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University School of Medicine, Cardiff CF24 4HQ, UK
| | - Nenad Sestan
- Department of Comparative Medicine, Yale University School of Medicine, New Haven, CT 06520, USA
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Alexander Gusev
- Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard Medical School, Boston, MA 02215, USA
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02215, USA
| | - Bogdan Pasaniuc
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Michael J Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
17
|
Durant PC, Bhasin A, Juenger TE, Heckman RW. Genetically correlated leaf tensile and morphological traits are driven by growing season length in a widespread perennial grass. AMERICAN JOURNAL OF BOTANY 2024; 111:e16349. [PMID: 38783552 DOI: 10.1002/ajb2.16349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 03/19/2024] [Accepted: 03/20/2024] [Indexed: 05/25/2024]
Abstract
PREMISE Leaf tensile resistance, a leaf's ability to withstand pulling forces, is an important determinant of plant ecological strategies. One potential driver of leaf tensile resistance is growing season length. When growing seasons are long, strong leaves, which often require more time and resources to construct than weak leaves, may be more advantageous than when growing seasons are short. Growing season length and other ecological conditions may also impact the morphological traits that underlie leaf tensile resistance. METHODS To understand variation in leaf tensile resistance, we measured size-dependent leaf strength and size-independent leaf toughness in diverse genotypes of the widespread perennial grass Panicum virgatum (switchgrass) in a common garden. We then used quantitative genetic approaches to estimate the heritability of leaf tensile resistance and whether there were genetic correlations between leaf tensile resistance and other morphological traits. RESULTS Leaf tensile resistance was positively associated with aboveground biomass (a proxy for fitness). Moreover, both measures of leaf tensile resistance exhibited high heritability and were positively genetically correlated with leaf lamina thickness and leaf mass per area (LMA). Leaf tensile resistance also increased with the growing season length in the habitat of origin, and this effect was mediated by both LMA and leaf thickness. CONCLUSIONS Differences in growing season length may promote selection for different leaf lifespans and may explain existing variation in leaf tensile resistance in P. virgatum. In addition, the high heritability of leaf tensile resistance suggests that P. virgatum will be able to respond to climate change as growing seasons lengthen.
Collapse
Affiliation(s)
- P Camilla Durant
- Department of Integrated Biology, University of Texas at Austin, Austin, 78712, TX, USA
| | - Amit Bhasin
- Department of Civil, Architectural and Environmental Engineering, University of Texas at Austin, Austin, 78712, TX, USA
| | - Thomas E Juenger
- Department of Integrated Biology, University of Texas at Austin, Austin, 78712, TX, USA
| | - Robert W Heckman
- Department of Integrated Biology, University of Texas at Austin, Austin, 78712, TX, USA
| |
Collapse
|
18
|
Zhu X, Yang Y, Lorincz-Comi N, Li G, Bentley AR, de Vries PS, Brown M, Morrison AC, Rotimi CN, Gauderman WJ, Rao DC, Aschard H. An approach to identify gene-environment interactions and reveal new biological insight in complex traits. Nat Commun 2024; 15:3385. [PMID: 38649715 PMCID: PMC11035594 DOI: 10.1038/s41467-024-47806-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 04/10/2024] [Indexed: 04/25/2024] Open
Abstract
There is a long-standing debate about the magnitude of the contribution of gene-environment interactions to phenotypic variations of complex traits owing to the low statistical power and few reported interactions to date. To address this issue, the Gene-Lifestyle Interactions Working Group within the Cohorts for Heart and Aging Research in Genetic Epidemiology Consortium has been spearheading efforts to investigate G × E in large and diverse samples through meta-analysis. Here, we present a powerful new approach to screen for interactions across the genome, an approach that shares substantial similarity to the Mendelian randomization framework. We identify and confirm 5 loci (6 independent signals) interacted with either cigarette smoking or alcohol consumption for serum lipids, and empirically demonstrate that interaction and mediation are the major contributors to genetic effect size heterogeneity across populations. The estimated lower bound of the interaction and environmentally mediated heritability is significant (P < 0.02) for low-density lipoprotein cholesterol and triglycerides in Cross-Population data. Our study improves the understanding of the genetic architecture and environmental contributions to complex traits.
Collapse
Affiliation(s)
- Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA.
| | - Yihe Yang
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Noah Lorincz-Comi
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Gen Li
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Amy R Bentley
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Paul S de Vries
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Michael Brown
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Charles N Rotimi
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - W James Gauderman
- Division of Biostatistics, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
| | - Dabeeru C Rao
- Center for Biostatistics and Data Science, Institute for Informatics, Data Science and Biostatistics, Washington University School of Medicine, St. Louis, MO, USA
| | - Hugues Aschard
- Institut Pasteur, Université Paris Cité, Department of Computational Biology, F-75015, Paris, France
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| |
Collapse
|
19
|
Troubat L, Fettahoglu D, Henches L, Aschard H, Julienne H. Multi-trait GWAS for diverse ancestries: mapping the knowledge gap. BMC Genomics 2024; 25:375. [PMID: 38627641 PMCID: PMC11022331 DOI: 10.1186/s12864-024-10293-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 04/09/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Approximately 95% of samples analyzed in univariate genome-wide association studies (GWAS) are of European ancestry. This bias toward European ancestry populations in association screening also exists for other analyses and methods that are often developed and tested on European ancestry only. However, existing data in non-European populations, which are often of modest sample size, could benefit from innovative approaches as recently illustrated in the context of polygenic risk scores. METHODS Here, we extend and assess the potential limitations and gains of our multi-trait GWAS pipeline, JASS (Joint Analysis of Summary Statistics), for the analysis of non-European ancestries. To this end, we conducted the joint GWAS of 19 hematological traits and glycemic traits across five ancestries (European (EUR), admixed American (AMR), African (AFR), East Asian (EAS), and South-East Asian (SAS)). RESULTS We detected 367 new genome-wide significant associations in non-European populations (15 in Admixed American (AMR), 72 in African (AFR) and 280 in East Asian (EAS)). New associations detected represent 5%, 17% and 13% of associations in the AFR, AMR and EAS populations, respectively. Overall, multi-trait testing increases the replication of European associated loci in non-European ancestry by 15%. Pleiotropic effects were highly similar at significant loci across ancestries (e.g. the mean correlation between multi-trait genetic effects of EUR and EAS ancestries was 0.88). For hematological traits, strong discrepancies in multi-trait genetic effects are tied to known evolutionary divergences: the ARKC1 loci, which is adaptive to overcome p.vivax induced malaria. CONCLUSIONS Multi-trait GWAS can be a valuable tool to narrow the genetic knowledge gap between European and non-European populations.
Collapse
Affiliation(s)
- Lucie Troubat
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
| | - Deniz Fettahoglu
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
| | - Léo Henches
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Hanna Julienne
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France.
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris, F-75015, France.
| |
Collapse
|
20
|
Caggiano C, Morselli M, Qian X, Celona B, Thompson M, Wani S, Tosevska A, Taraszka K, Heuer G, Ngo S, Steyn F, Nestor P, Wallace L, McCombe P, Heggie S, Thorpe K, McElligott C, English G, Henders A, Henderson R, Lomen-Hoerth C, Wray N, McRae A, Pellegrini M, Garton F, Zaitlen N. Tissue informative cell-free DNA methylation sites in amyotrophic lateral sclerosis. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.08.24305503. [PMID: 38645132 PMCID: PMC11030489 DOI: 10.1101/2024.04.08.24305503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Cell-free DNA (cfDNA) is increasingly recognized as a promising biomarker candidate for disease monitoring. However, its utility in neurodegenerative diseases, like amyotrophic lateral sclerosis (ALS), remains underexplored. Existing biomarker discovery approaches are tailored to a specific disease context or are too expensive to be clinically practical. Here, we address these challenges through a new approach combining advances in molecular and computational technologies. First, we develop statistical tools to select tissue-informative DNA methylation sites relevant to a disease process of interest. We then employ a capture protocol to select these sites and perform targeted methylation sequencing. Multi-modal information about the DNA methylation patterns are then utilized in machine learning algorithms trained to predict disease status and disease progression. We applied our method to two independent cohorts of ALS patients and controls (n=192). Overall, we found that the targeted sites accurately predicted ALS status and replicated between cohorts. Additionally, we identified epigenetic features associated with ALS phenotypes, including disease severity. These findings highlight the potential of cfDNA as a non-invasive biomarker for ALS.
Collapse
Affiliation(s)
- C Caggiano
- Department of Neurology, UCLA, Los Angeles, California
- Institute of Genomic Health, Icahn School of Medicine at Mt Sinai, New York, New York
| | - M Morselli
- Department of Molecular, Cell, and Developmental Biology, UCLA; Los Angeles, California
- Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - X Qian
- Institute for Molecular Biology, University of Queensland, Brisbane, Australia
| | - B Celona
- Cardiovascular Research Institute, UCSF, San Francisco, California
| | - M Thompson
- Department of Neurology, UCLA, Los Angeles, California
- Systems and Synthetic Biology, Centre for Genomic Regulation, Barcelona, Spain
| | - S Wani
- Cardiovascular Research Institute, UCSF, San Francisco, California
| | - A Tosevska
- Department of Molecular, Cell, and Developmental Biology, UCLA; Los Angeles, California
- Department of Internal Medicine III, Division of Rheumatology, Medical University of Vienna, Vienna, Austria
| | - K Taraszka
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - G Heuer
- Computational and Systems Biology Interdepartmental Program, UCLA, Los Angeles, California
| | - S Ngo
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia
- Department of Neurology, Royal Brisbane and Women's Hospital, Brisbane, QLD, Australia
| | - F Steyn
- School of Biomedical Sciences, Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - P Nestor
- Queensland Brain Institute, Unviversity of Queensland, Brisbane, Australia
- Mater Public Hospital, Brisbane, Australia
| | - L Wallace
- Institute for Molecular Biology, University of Queensland, Brisbane, Australia
| | - P McCombe
- Department of Neurology, Royal Brisbane and Women's Hospital, Brisbane, QLD, Australia
| | - S Heggie
- Department of Neurology, Royal Brisbane and Women's Hospital, Brisbane, QLD, Australia
| | - K Thorpe
- Department of Neurology, Royal Brisbane and Women's Hospital, Brisbane, QLD, Australia
| | | | - G English
- Institute for Molecular Biology, University of Queensland, Brisbane, Australia
| | - A Henders
- Institute for Molecular Biology, University of Queensland, Brisbane, Australia
| | - R Henderson
- Department of Neurology, Royal Brisbane and Women's Hospital, Brisbane, QLD, Australia
| | - C Lomen-Hoerth
- Department of Neurology, UCSF, San Francisco, California
| | - N Wray
- Institute for Molecular Biology, University of Queensland, Brisbane, Australia
| | - A McRae
- Institute for Molecular Biology, University of Queensland, Brisbane, Australia
| | - M Pellegrini
- Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - F Garton
- Institute for Molecular Biology, University of Queensland, Brisbane, Australia
| | - N Zaitlen
- Department of Neurology, UCLA, Los Angeles, California
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California
| |
Collapse
|
21
|
Grinde KE, Browning BL, Reiner AP, Thornton TA, Browning SR. Adjusting for principal components can induce spurious associations in genome-wide association studies in admixed populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.02.587682. [PMID: 38617337 PMCID: PMC11014513 DOI: 10.1101/2024.04.02.587682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/24/2024]
Abstract
Principal component analysis (PCA) is widely used to control for population structure in genome-wide association studies (GWAS). Top principal components (PCs) typically reflect population structure, but challenges arise in deciding how many PCs are needed and ensuring that PCs do not capture other artifacts such as regions with atypical linkage disequilibrium (LD). In response to the latter, many groups suggest performing LD pruning or excluding known high LD regions prior to PCA. However, these suggestions are not universally implemented and the implications for GWAS are not fully understood, especially in the context of admixed populations. In this paper, we investigate the impact of pre-processing and the number of PCs included in GWAS models in African American samples from the Women's Women's Health Initiative SNP Health Association Resource and two Trans-Omics for Precision Medicine Whole Genome Sequencing Project contributing studies (Jackson Heart Study and Genetic Epidemiology of Chronic Obstructive Pulmonary Disease Study). In all three samples, we find the first PC is highly correlated with genome-wide ancestry whereas later PCs often capture local genomic features. The pattern of which, and how many, genetic variants are highly correlated with individual PCs differs from what has been observed in prior studies focused on European populations and leads to distinct downstream consequences: adjusting for such PCs yields biased effect size estimates and elevated rates of spurious associations due to the phenomenon of collider bias. Excluding high LD regions identified in previous studies does not resolve these issues. LD pruning proves more effective, but the optimal choice of thresholds varies across datasets. Altogether, our work highlights unique issues that arise when using PCA to control for ancestral heterogeneity in admixed populations and demonstrates the importance of careful pre-processing and diagnostics to ensure that PCs capturing multiple local genomic features are not included in GWAS models.
Collapse
Affiliation(s)
- Kelsey E. Grinde
- Department of Mathematics, Statistics, and Computer Science, Macalester College, Saint Paul, Minnesota, 55105, USA
| | - Brian L. Browning
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, 98195, USA
| | - Alexander P. Reiner
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, 98109, USA
- Department of Epidemiology, University of Washington, Seattle, Washington, 98195, USA
| | - Timothy A. Thornton
- Regeneron Genetics Center, Tarrytown, New York, 10591, USA
- Department of Biostatistics, University of Washington, Seattle, Washington, 98195, USA
| | - Sharon R. Browning
- Department of Biostatistics, University of Washington, Seattle, Washington, 98195, USA
| |
Collapse
|
22
|
Alrfooh A, Casten LG, Richards JG, Wemmie JA, Magnotta VA, Fiedorowicz JG, Michaelson J, Williams AJ, Gaine ME. Investigating the relationship between DNA methylation, genetic variation, and suicide attempt in bipolar disorder. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.03.24305263. [PMID: 38633806 PMCID: PMC11023653 DOI: 10.1101/2024.04.03.24305263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Individuals with bipolar disorder are at increased risk for suicide, and this can be influenced by a range of biological, clinical, and environmental risk factors. Biological components associated with suicide include DNA modifications that lead to changes in gene expression. Common genetic variation and DNA methylation changes are some of the most frequent types of DNA findings associated with an increased risk for suicidal behavior. Importantly, the interplay between genetic predisposition and DNA methylation patterns is becoming more prevalent in genetic studies. We hypothesized that DNA methylation patterns in specific loci already genetically associated with suicide would be altered in individuals with bipolar disorder and a history of suicide attempt. To test this hypothesis, we searched the literature to identify common genetic variants (N=34) previously associated with suicidal thoughts and behaviors in individuals with bipolar disorder. We then created a customized sequencing panel that covered our chosen genomic loci. We profiled DNA methylation patterns from blood samples collected from bipolar disorder participants with suicidal behavior (N=55) and without suicidal behavior (N=51). We identified seven differentially methylated CpG sites and five differentially methylated regions between the two groups. Additionally, we found that DNA methylation changes in MIF and CACNA1C were associated with lethality or number of suicide attempts. Finally, we identified three meQTLs in SIRT1 , IMPA2 , and INPP1 . This study illustrates that DNA methylation is altered in individuals with bipolar disorder and a history of suicide attempts in regions known to harbor suicide-related variants.
Collapse
|
23
|
Sousa LPB, Pinto LFB, Cruz VAR, Oliveira GA, Rojas de Oliveira H, Chud TS, Pedrosa VB, Miglior F, Schenkel FS, Brito LF. Genome-wide association and functional genomic analyses for various hoof health traits in North American Holstein cattle. J Dairy Sci 2024; 107:2207-2230. [PMID: 37939841 DOI: 10.3168/jds.2023-23806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 10/19/2023] [Indexed: 11/10/2023]
Abstract
Hoof diseases are a major welfare and economic issue in the global dairy cattle production industry, which can be minimized through improved management and breeding practices. Optimal genetic improvement of hoof health could benefit from a deep understanding of the genetic background and biological underpinning of indicators of hoof health. Therefore, the primary objectives of this study were to perform genome-wide association studies, using imputed high-density genetic markers data from North American Holstein cattle, for 8 hoof-related traits: digital dermatitis, sole ulcer, sole hemorrhage, white line lesion, heel horn erosion, interdigital dermatitis, interdigital hyperplasia, and toe ulcer, and a hoof health index. De-regressed estimated breeding values from 25,580 Holstein animals were used as pseudo-phenotypes for the association analyses. The genomic quality control, genotype phasing, and genotype imputation were performed using the PLINK (version 1.9), Eagle (version 2.4.1), and Minimac4 software, respectively. The functional genomic analyses were performed using the GALLO R package and the DAVID platform. We identified 22, 34, 14, 22, 28, 33, 24, 43, and 15 significant markers for digital dermatitis, heel horn erosion, interdigital dermatitis, interdigital hyperplasia, sole hemorrhage, sole ulcer, toe ulcer, white line lesion disease, and the hoof health index, respectively. The significant markers were located across all autosomes, except BTA10, BTA12, BTA20, BTA26, BTA27, and BTA28. Moreover, the genomic regions identified overlap with various previously reported quantitative trait loci for exterior, health, meat and carcass, milk, production, and reproduction traits. The enrichment analyses identified 44 significant gene ontology terms. These enriched genomic regions harbor various candidate genes previously associated with bone development, metabolism, and infectious and immunological diseases. These findings indicate that hoof health traits are highly polygenic and influenced by a wide range of biological processes.
Collapse
Affiliation(s)
- Luis Paulo B Sousa
- Department of Animal Sciences, Federal University of Bahia, Salvador, BA, 40170-110, Brazil
| | - Luis Fernando B Pinto
- Department of Animal Sciences, Federal University of Bahia, Salvador, BA, 40170-110, Brazil
| | - Valdecy A R Cruz
- Department of Animal Sciences, Federal University of Bahia, Salvador, BA, 40170-110, Brazil
| | - Gerson A Oliveira
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Hinayah Rojas de Oliveira
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, N1G 2W1, Canada; Department of Animal Sciences, Purdue University, West Lafayette, IN 47907
| | - Tatiane S Chud
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, N1G 2W1, Canada; PEAK, Madison, WI 53718
| | - Victor B Pedrosa
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907
| | - Filippo Miglior
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, N1G 2W1, Canada; Lactanet Canada, Guelph, ON, N1K 1E5, Canada
| | - Flávio S Schenkel
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Luiz F Brito
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, N1G 2W1, Canada; Department of Animal Sciences, Purdue University, West Lafayette, IN 47907.
| |
Collapse
|
24
|
Timmins IR, Dudbridge F. Bayesian approach to assessing population differences in genetic risk of disease with application to prostate cancer. PLoS Genet 2024; 20:e1011212. [PMID: 38630784 PMCID: PMC11023298 DOI: 10.1371/journal.pgen.1011212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/07/2024] [Indexed: 04/19/2024] Open
Abstract
Population differences in risk of disease are common, but the potential genetic basis for these differences is not well understood. A standard approach is to compare genetic risk across populations by testing for mean differences in polygenic scores, but existing studies that use this approach do not account for statistical noise in effect estimates (i.e., the GWAS betas) that arise due to the finite sample size of GWAS training data. Here, we show using Bayesian polygenic score methods that the level of uncertainty in estimates of genetic risk differences across populations is highly dependent on the GWAS training sample size, the polygenicity (number of causal variants), and genetic distance (FST) between the populations considered. We derive a Wald test for formally assessing the difference in genetic risk across populations, which we show to have calibrated type 1 error rates under a simplified assumption that all SNPs are independent, which we achieve in practise using linkage disequilibrium (LD) pruning. We further provide closed-form expressions for assessing the uncertainty in estimates of relative genetic risk across populations under the special case of an infinitesimal genetic architecture. We suggest that for many complex traits and diseases, particularly those with more polygenic architectures, current GWAS sample sizes are insufficient to detect moderate differences in genetic risk across populations, though more substantial differences in relative genetic risk (relative risk > 1.5) can be detected. We show that conventional approaches that do not account for sampling error from the training sample, such as using a simple t-test, have very high type 1 error rates. When applying our approach to prostate cancer, we demonstrate a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry.
Collapse
Affiliation(s)
- Iain R. Timmins
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, United Kingdom
- Statistical Innovation, AstraZeneca, Cambridge, United Kingdom
| | | | - Frank Dudbridge
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
25
|
Casten LG, Koomar T, Elsadany M, McKone C, Tysseling B, Sasidharan M, Tomblin JB, Michaelson JJ. Lingo: an automated, web-based deep phenotyping platform for language ability. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.29.24305034. [PMID: 38585791 PMCID: PMC10996758 DOI: 10.1101/2024.03.29.24305034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Background Language and the ability to communicate effectively are key factors in mental health and well-being. Despite this critical importance, research on language is limited by the lack of a scalable phenotyping toolkit. Methods Here, we describe and showcase Lingo - a flexible online battery of language and nonverbal reasoning skills based on seven widely used tasks (COWAT, picture narration, vocal rhythm entrainment, rapid automatized naming, following directions, sentence repetition, and nonverbal reasoning). The current version of Lingo takes approximately 30 minutes to complete, is entirely open source, and allows for a wide variety of performance metrics to be extracted. We asked > 1,300 individuals from multiple samples to complete Lingo, then investigated the validity and utility of the resulting data. Results We conducted an exploratory factor analysis across 14 features derived from the seven assessments, identifying five factors. Four of the five factors showed acceptable test-retest reliability (Pearson's R > 0.7). Factor 2 showed the highest reliability (Pearson's R = 0.95) and loaded primarily on sentence repetition task performance. We validated Lingo with objective measures of language ability by comparing performance to gold-standard assessments: CELF-5 and the VABS-3. Factor 2 was significantly associated with the CELF-5 "core language ability" scale (Pearson's R = 0.77, p-value < 0.05) and the VABS-3 "communication" scale (Pearson's R = 0.74, p-value < 0.05). Factor 2 was positively associated with phenotypic and genetic measures of socieconomic status. Interestingly, we found the parents of children with language impairments had lower Factor 2 scores (p-value < 0.01). Finally, we found Lingo factor scores were significantly predictive of numerous psychiatric and neurodevelopmental conditions. Conclusions Together, these analyses support Lingo as a powerful platform for scalable deep phenotyping of language and other cognitive abilities. Additionally, exploratory analyses provide supporting evidence for the heritability of language ability and the complex relationship between mental health and language.
Collapse
Affiliation(s)
- Lucas G. Casten
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA
- Department of Psychiatry, University of Iowa, Iowa City, IA
| | - Tanner Koomar
- Department of Psychiatry, University of Iowa, Iowa City, IA
| | - Muhammad Elsadany
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA
- Department of Psychiatry, University of Iowa, Iowa City, IA
| | - Caleb McKone
- Department of Psychiatry, University of Iowa, Iowa City, IA
| | - Ben Tysseling
- Department of Psychiatry, University of Iowa, Iowa City, IA
| | | | - J. Bruce Tomblin
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA
| | - Jacob J. Michaelson
- Department of Psychiatry, University of Iowa, Iowa City, IA
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA
- Iowa Neuroscience Institute, University of Iowa, Iowa City, IA
- Hawkeye Intellectual and Developmental Disabilities Research Center (Hawk-IDDRC), University of Iowa, Iowa City, IA
| |
Collapse
|
26
|
Thomas TR, Tener AJ, Pearlman AM, Imborek KL, Yang JS, Strang JF, Michaelson JJ. Polygenic Scores Clarify the Relationship Between Mental Health and Gender Diversity. BIOLOGICAL PSYCHIATRY GLOBAL OPEN SCIENCE 2024; 4:100291. [PMID: 38425476 PMCID: PMC10901838 DOI: 10.1016/j.bpsgos.2024.100291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 12/21/2023] [Accepted: 12/26/2023] [Indexed: 03/02/2024] Open
Abstract
Background Gender-diverse individuals are at increased risk for mental health problems, but it is unclear whether this is due to shared environmental or genetic factors. Methods In two SPARK samples, we tested for associations of 16 polygenic scores (PGSs) with quantitative measures of gender diversity and mental health. In study 1, 639 independent adults (59% autistic) reported their mental health with the Adult Self-Report and their gender diversity with the Gender Self-Report (GSR). The GSR has 2 dimensions: binary (degree of identification with the gender opposite that implied by sex designated at birth) and nonbinary (degree of identification with a gender that is neither male nor female). In study 2 (N = 5165), we used a categorical measure of gender identity. Results In study 1, neuropsychiatric PGSs were positively associated with Adult Self-Report scores: externalizing was positively associated with the attention-deficit/hyperactivity disorder PGS (β = 0.10 [0.03-0.17]), and internalizing was positively associated with the PGSs for depression (β = 0.07 [0-0.14]) and neuroticism (β = 0.10 [0.03-0.17]). Interestingly, GSR scores were not significantly associated with any neuropsychiatric PGS. However, GSR nonbinary was positively associated with the cognitive performance PGS (β = 0.11 [0.05-0.18]), with the effect size comparable in magnitude to the associations of the neuropsychiatric PGSs with the Adult Self-Report. Additionally, GSR binary was positively associated with the nonheterosexual sexual behavior PGS (β = 0.07 [0-0.14]). In study 2, the cognitive performance PGS effect replicated; transgender and nonbinary individuals had higher PGSs (t316 = 4.16). Conclusions We showed that while gender diversity is phenotypically positively associated with mental health problems, the strongest PGS associations with gender diversity were with the cognitive performance PGS, not the neuropsychiatric PGSs.
Collapse
Affiliation(s)
| | - Ashton J. Tener
- Department of Psychiatry, University of Iowa, Iowa City, Iowa
| | | | | | - Ji Seung Yang
- Department of Human Development and Quantitative Methodology, University of Maryland, College Park, Maryland
| | - John F. Strang
- Gender and Autism Program, Center for Neuroscience, Children’s National Hospital, George Washington University School of Medicine, Washington, District of Columbia
| | - Jacob J. Michaelson
- Department of Psychiatry, University of Iowa, Iowa City, Iowa
- Iowa Neuroscience Institute, University of Iowa, Iowa City, Iowa
- Hawkeye Intellectual and Developmental Disabilities Research Center, University of Iowa, Iowa City, Iowa
| |
Collapse
|
27
|
Kolobkov D, Mishra Sharma S, Medvedev A, Lebedev M, Kosaretskiy E, Vakhitov R. Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project. Front Big Data 2024; 7:1266031. [PMID: 38487517 PMCID: PMC10937521 DOI: 10.3389/fdata.2024.1266031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 01/31/2024] [Indexed: 03/17/2024] Open
Abstract
Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.
Collapse
Affiliation(s)
- Dmitry Kolobkov
- GENXT, Hinxton, United Kingdom
- Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Moscow, Russia
| | - Satyarth Mishra Sharma
- GENXT, Hinxton, United Kingdom
- Center for Artificial Intelligence Technology, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Aleksandr Medvedev
- GENXT, Hinxton, United Kingdom
- Center for Artificial Intelligence Technology, Skolkovo Institute of Science and Technology, Moscow, Russia
| | | | | | | |
Collapse
|
28
|
Tian R, Ge T, Kweon H, Rocha DB, Lam M, Liu JZ, Singh K, Levey DF, Gelernter J, Stein MB, Tsai EA, Huang H, Chabris CF, Lencz T, Runz H, Chen CY. Whole-exome sequencing in UK Biobank reveals rare genetic architecture for depression. Nat Commun 2024; 15:1755. [PMID: 38409228 PMCID: PMC10897433 DOI: 10.1038/s41467-024-45774-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 02/02/2024] [Indexed: 02/28/2024] Open
Abstract
Nearly two hundred common-variant depression risk loci have been identified by genome-wide association studies (GWAS). However, the impact of rare coding variants on depression remains poorly understood. Here, we present whole-exome sequencing analyses of depression with seven different definitions based on survey, questionnaire, and electronic health records in 320,356 UK Biobank participants. We showed that the burden of rare damaging coding variants in loss-of-function intolerant genes is significantly associated with risk of depression with various definitions. We compared the rare and common genetic architecture across depression definitions by genetic correlation and showed different genetic relationships between definitions across common and rare variants. In addition, we demonstrated that the effects of rare damaging coding variant burden and polygenic risk score on depression risk are additive. The gene set burden analyses revealed overlapping rare genetic variant components with developmental disorder, autism, and schizophrenia. Our study provides insights into the contribution of rare coding variants, separately and in conjunction with common variants, on depression with various definitions and their genetic relationships with neurodevelopmental disorders.
Collapse
Affiliation(s)
- Ruoyu Tian
- Biogen Inc, Cambridge, MA, USA
- Dewpoint Therapeutics, Boston, MA, USA
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hyeokmoon Kweon
- Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Autism & Developmental Medicine Institute, Geisinger Health System, Lewisburg, PA, USA
| | - Daniel B Rocha
- Phenomics Analytics and Clinical Data Core, Geisinger Health System, Danville, PA, USA
| | - Max Lam
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, USA
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA
- North Region, Institute of Mental Health, Singapore, Singapore
| | - Jimmy Z Liu
- Biogen Inc, Cambridge, MA, USA
- GlaxoSmithKline, Upper Providence, Philadelphia, PA, USA
| | - Kritika Singh
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel F Levey
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
- VA Connecticut Healthcare Center, West Haven, CT, USA
| | - Joel Gelernter
- VA Connecticut Healthcare Center, West Haven, CT, USA
- Departments of Psychiatry, Genetics, and Neuroscience, Yale University School of Medicine, New Haven, CT, USA
| | - Murray B Stein
- VA San Diego Healthcare System, San Diego, CA, USA
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | | | - Hailiang Huang
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Christopher F Chabris
- Autism & Developmental Medicine Institute, Geisinger Health System, Lewisburg, PA, USA
| | - Todd Lencz
- Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, USA
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA
- Departments of Psychiatry and Molecular Medicine, Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA
| | | | | |
Collapse
|
29
|
Liu D, Billington CJ, Raja N, Wong ZC, Levin MD, Resch W, Alba C, Hupalo DN, Biamino E, Bedeschi MF, Digilio MC, Squeo GM, Villa R, Parrish PCR, Knutsen RH, Osgood S, Freeman JA, Dalgard CL, Merla G, Pober BR, Mervis CB, Roberts AE, Morris CA, Osborne LR, Kozel BA. Matrisome and Immune Pathways Contribute to Extreme Vascular Outcomes in Williams-Beuren Syndrome. J Am Heart Assoc 2024; 13:e031377. [PMID: 38293922 PMCID: PMC11056152 DOI: 10.1161/jaha.123.031377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 11/28/2023] [Indexed: 02/01/2024]
Abstract
BACKGROUND Supravalvar aortic stenosis (SVAS) is a characteristic feature of Williams-Beuren syndrome (WBS). Its severity varies: ~20% of people with Williams-Beuren syndrome have SVAS requiring surgical intervention, whereas ~35% have no appreciable SVAS. The remaining individuals have SVAS of intermediate severity. Little is known about genetic modifiers that contribute to this variability. METHODS AND RESULTS We performed genome sequencing on 473 individuals with Williams-Beuren syndrome and developed strategies for modifier discovery in this rare disease population. Approaches include extreme phenotyping and nonsynonymous variant prioritization, followed by gene set enrichment and pathway-level association tests. We next used GTEx v8 and proteomic data sets to verify expression of candidate modifiers in relevant tissues. Finally, we evaluated overlap between the genes/pathways identified here and those ascertained through larger aortic disease/trait genome-wide association studies. We show that SVAS severity in Williams-Beuren syndrome is associated with increased frequency of common and rarer variants in matrisome and immune pathways. Two implicated matrisome genes (ACAN and LTBP4) were uniquely expressed in the aorta. Many genes in the identified pathways were previously reported in genome-wide association studies for aneurysm, bicuspid aortic valve, or aortic size. CONCLUSIONS Smaller sample sizes in rare disease studies necessitate new approaches to detect modifiers. Our strategies identified variation in matrisome and immune pathways that are associated with SVAS severity. These findings suggest that, like other aortopathies, SVAS may be influenced by the balance of synthesis and degradation of matrisome proteins. Leveraging multiomic data and results from larger aorta-focused genome-wide association studies may accelerate modifier discovery for rare aortopathies like SVAS.
Collapse
Affiliation(s)
- Delong Liu
- National Heart, Lung, and Blood InstituteNational Institutes of HealthBethesdaMD
| | - Charles J. Billington
- National Heart, Lung, and Blood InstituteNational Institutes of HealthBethesdaMD
- Department of PediatricsUniversity of MinnesotaMinneapolisMN
| | - Neelam Raja
- National Heart, Lung, and Blood InstituteNational Institutes of HealthBethesdaMD
| | - Zoe C. Wong
- National Heart, Lung, and Blood InstituteNational Institutes of HealthBethesdaMD
| | - Mark D. Levin
- National Heart, Lung, and Blood InstituteNational Institutes of HealthBethesdaMD
| | - Wulfgang Resch
- The High Performance Computing FacilityCenter for Information Technology, National Institutes of HealthBethesdaMD
| | - Camille Alba
- Henry M Jackson Foundation for the Advancement of Military MedicineBethesdaMD
| | - Daniel N. Hupalo
- Henry M Jackson Foundation for the Advancement of Military MedicineBethesdaMD
| | | | | | | | - Gabriella Maria Squeo
- Laboratory of Regulatory and Functional GenomicsFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni Rotondo (Foggia)Italy
| | - Roberta Villa
- Fondazione IRCCS Ca Granda Ospedale Maggiore Policlinico Medical Genetic UnitMilanItaly
| | - Pheobe C. R. Parrish
- National Heart, Lung, and Blood InstituteNational Institutes of HealthBethesdaMD
- Department of Genome SciencesUniversity of WashingtonSeattleWA
| | - Russell H. Knutsen
- National Heart, Lung, and Blood InstituteNational Institutes of HealthBethesdaMD
| | - Sharon Osgood
- National Heart, Lung, and Blood InstituteNational Institutes of HealthBethesdaMD
| | - Joy A. Freeman
- National Heart, Lung, and Blood InstituteNational Institutes of HealthBethesdaMD
| | - Clifton L. Dalgard
- Department of Anatomy, Physiology and Genetics, School of Medicinethe Uniformed Services University of the Health SciencesBethesdaMD
| | - Giuseppe Merla
- Laboratory of Regulatory and Functional GenomicsFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni Rotondo (Foggia)Italy
- Department of Molecular Medicine and Medical BiotechnologyUniversity of Naples Federico IINaplesItaly
| | - Barbara R. Pober
- Section of Genetics, Department of PediatricsMassachusetts General HospitalBostonMA
| | - Carolyn B. Mervis
- Department of Psychological and Brain SciencesUniversity of LouisvilleLouisvilleKY
| | - Amy E. Roberts
- Department of Cardiology and Division of Genetics and Genomics, Department of PediatricsBoston Children’s HospitalBostonMA
| | - Colleen A. Morris
- Department of PediatricsKirk Kerkorian School of Medicine at UNLVLas VegasNV
| | - Lucy R. Osborne
- Departments of Medicine and Molecular GeneticsUniversity of TorontoCanada
| | - Beth A. Kozel
- National Heart, Lung, and Blood InstituteNational Institutes of HealthBethesdaMD
| |
Collapse
|
30
|
Li X, Sham PC, Zhang YD. A Bayesian fine-mapping model using a continuous global-local shrinkage prior with applications in prostate cancer analysis. Am J Hum Genet 2024; 111:213-226. [PMID: 38171363 PMCID: PMC10870138 DOI: 10.1016/j.ajhg.2023.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
The aim of fine mapping is to identify genetic variants causally contributing to complex traits or diseases. Existing fine-mapping methods employ Bayesian discrete mixture priors and depend on a pre-specified maximum number of causal variants, which may lead to sub-optimal solutions. In this work, we propose a Bayesian fine-mapping method called h2-D2, utilizing a continuous global-local shrinkage prior. We also present an approach to define credible sets of causal variants in continuous prior settings. Simulation studies demonstrate that h2-D2 outperforms current state-of-the-art fine-mapping methods such as SuSiE and FINEMAP in accurately identifying causal variants and estimating their effect sizes. We further applied h2-D2 to prostate cancer analysis and discovered some previously unknown causal variants. In addition, we inferred 369 target genes associated with the detected causal variants and several pathways that were significantly over-represented by these genes, shedding light on their potential roles in prostate cancer development and progression.
Collapse
Affiliation(s)
- Xiang Li
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Pak Chung Sham
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China; Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Yan Dora Zhang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
31
|
Aw AJ, McRae J, Rahmani E, Song YS. Highly parameterized polygenic scores tend to overfit to population stratification via random effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.27.577589. [PMID: 38352303 PMCID: PMC10862757 DOI: 10.1101/2024.01.27.577589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Polygenic scores (PGSs), increasingly used in clinical settings, frequently include many genetic variants, with performance typically peaking at thousands of variants. Such highly parameterized PGSs often include variants that do not pass a genome-wide significance threshold. We propose a mathematical perspective that renders the effects of many of these non-significant variants random rather than causal, with the randomness capturing population structure. We devise methods to assess variant effect randomness and population stratification bias. Applying these methods to 141 traits from the UK Biobank, we find that, for many PGSs, the effects of non-significant variants are considerably random, with the extent of randomness associated with the degree of overfitting to population structure of the discovery cohort. Our findings explain why highly parameterized PGSs simultaneously have superior cohort-specific performance and limited generalizability, suggesting the critical need for variant randomness tests in PGS evaluation. Supporting code and a dashboard are available at https://github.com/songlab-cal/StratPGS.
Collapse
Affiliation(s)
- Alan J. Aw
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
- Artificial Intelligence Laboratory, Illumina Inc
| | - Jeremy McRae
- Artificial Intelligence Laboratory, Illumina Inc
| | - Elior Rahmani
- Department of Computational Medicine, University of California, Los Angeles
| | - Yun S. Song
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
- Computer Science Division, University of California, Berkeley
| |
Collapse
|
32
|
Puritz JB, Guo X, Hare M, He Y, Hillier LW, Jin S, Liu M, Lotterhos KE, Minx P, Modak T, Proestou D, Rice ES, Tomlinson C, Warren WC, Witkop E, Zhao H, Gomez-Chiarri M. A second unveiling: Haplotig masking of the eastern oyster genome improves population-level inference. Mol Ecol Resour 2024; 24:e13801. [PMID: 37186213 DOI: 10.1111/1755-0998.13801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 12/16/2022] [Accepted: 03/20/2023] [Indexed: 05/17/2023]
Abstract
Genome assembly can be challenging for species that are characterized by high amounts of polymorphism, heterozygosity, and large effective population sizes. High levels of heterozygosity can result in genome mis-assemblies and a larger than expected genome size due to the haplotig versions of a single locus being assembled as separate loci. Here, we describe the first chromosome-level genome for the eastern oyster, Crassostrea virginica. Publicly released and annotated in 2017, the assembly has a scaffold N50 of 54 mb and is over 97.3% complete based on BUSCO analysis. The genome assembly for the eastern oyster is a critical resource for foundational research into molluscan adaptation to a changing environment and for selective breeding for the aquaculture industry. Subsequent resequencing data suggested the presence of haplotigs in the original assembly, and we developed a post hoc method to break up chimeric contigs and mask haplotigs in published heterozygous genomes and evaluated improvements to the accuracy of downstream analysis. Masking haplotigs had a large impact on SNP discovery and estimates of nucleotide diversity and had more subtle and nuanced effects on estimates of heterozygosity, population structure analysis, and outlier detection. We show that haplotig masking can be a powerful tool for improving genomic inference, and we present an open, reproducible resource for the masking of haplotigs in any published genome.
Collapse
Affiliation(s)
- Jonathan B Puritz
- Department of Biological Sciences, University of Rhode Island, Kingston, Rhode Island, USA
| | - Ximing Guo
- Haskin Shellfish Research Laboratory, Department of Marine and Coastal Sciences, Rutgers University, Port Norris, New Jersey, USA
| | - Matthew Hare
- Department of Natural Resources and the Environment, Cornell University, Ithaca, New York, USA
| | - Yan He
- Haskin Shellfish Research Laboratory, Department of Marine and Coastal Sciences, Rutgers University, Port Norris, New Jersey, USA
| | - LaDeana W Hillier
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Shubo Jin
- Haskin Shellfish Research Laboratory, Department of Marine and Coastal Sciences, Rutgers University, Port Norris, New Jersey, USA
| | - Ming Liu
- Haskin Shellfish Research Laboratory, Department of Marine and Coastal Sciences, Rutgers University, Port Norris, New Jersey, USA
| | - Katie E Lotterhos
- Department of Marine and Environmental Sciences, Northeastern University Marine Science Center, Nahant, Massachusetts, USA
| | - Pat Minx
- Donald Danforth Plant Science Center, Olivette, Missouri, USA
| | - Tejashree Modak
- Department of Cell and Molecular Biology, University of Rhode Island, Kingston, Rhode Island, USA
| | - Dina Proestou
- USDA Agricultural Research Service, National Cold Water Marine Aquaculture Center, Kingston, Rhode Island, USA
| | - Edward S Rice
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, USA
| | - Wesley C Warren
- Departments of Animal Sciences and Surgery, Institute of Informatics and Data Sciences, Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
| | - Erin Witkop
- Department of Fisheries, Animal and Veterinary Sciences, University of Rhode Island, Kingston, Rhode Island, USA
| | - Honggang Zhao
- Department of Natural Resources and the Environment, Cornell University, Ithaca, New York, USA
| | - Marta Gomez-Chiarri
- Department of Fisheries, Animal and Veterinary Sciences, University of Rhode Island, Kingston, Rhode Island, USA
| |
Collapse
|
33
|
Cohen NM, Lifshitz A, Jaschek R, Rinott E, Balicer R, Shlush LI, Barbash GI, Tanay A. Longitudinal machine learning uncouples healthy aging factors from chronic disease risks. NATURE AGING 2024; 4:129-144. [PMID: 38062254 DOI: 10.1038/s43587-023-00536-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 11/02/2023] [Indexed: 01/21/2024]
Abstract
To understand human longevity, inherent aging processes must be distinguished from known etiologies leading to age-related chronic diseases. Such deconvolution is difficult to achieve because it requires tracking patients throughout their entire lives. Here, we used machine learning to infer health trajectories over the entire adulthood age range using extrapolation from electronic medical records with partial longitudinal coverage. Using this approach, our model tracked the state of patients who were healthy and free from known chronic disease risk and distinguished individuals with higher or lower longevity potential using a multivariate score. We showed that the model and the markers it uses performed consistently on data from Israeli, British and US populations. For example, mildly low neutrophil counts and alkaline phosphatase levels serve as early indicators of healthy aging that are independent of risk for major chronic diseases. We characterize the heritability and genetic associations of our longevity score and demonstrate at least 1 year of extended lifespan for parents of high-scoring patients compared to matched controls. Longitudinal modeling of healthy individuals is thereby established as a tool for understanding healthy aging and longevity.
Collapse
Affiliation(s)
- Netta Mendelson Cohen
- Department of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Aviezer Lifshitz
- Department of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Rami Jaschek
- Department of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Ehud Rinott
- Department of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Ran Balicer
- Clalit Research Institute, Ramat Gan, Israel
| | - Liran I Shlush
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Gabriel I Barbash
- Department of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel.
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel.
| | - Amos Tanay
- Department of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel.
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
34
|
Carrasco-Zanini J, Pietzner M, Wheeler E, Kerrison ND, Langenberg C, Wareham NJ. Multi-omic prediction of incident type 2 diabetes. Diabetologia 2024; 67:102-112. [PMID: 37889320 PMCID: PMC10709231 DOI: 10.1007/s00125-023-06027-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 08/30/2023] [Indexed: 10/28/2023]
Abstract
AIMS/HYPOTHESIS The identification of people who are at high risk of developing type 2 diabetes is a key part of population-level prevention strategies. Previous studies have evaluated the predictive utility of omics measurements, such as metabolites, proteins or polygenic scores, but have considered these separately. The improvement that combined omics biomarkers can provide over and above current clinical standard models is unclear. The aim of this study was to test the predictive performance of genome, proteome, metabolome and clinical biomarkers when added to established clinical prediction models for type 2 diabetes. METHODS We developed sparse interpretable prediction models in a prospective, nested type 2 diabetes case-cohort study (N=1105, incident type 2 diabetes cases=375) with 10,792 person-years of follow-up, selecting from 5759 features across the genome, proteome, metabolome and clinical biomarkers using least absolute shrinkage and selection operator (LASSO) regression. We compared the predictive performance of omics-derived predictors with a clinical model including the variables from the Cambridge Diabetes Risk Score and HbA1c. RESULTS Among single omics prediction models that did not include clinical risk factors, the top ten proteins alone achieved the highest performance (concordance index [C index]=0.82 [95% CI 0.75, 0.88]), suggesting the proteome as the most informative single omic layer in the absence of clinical information. However, the largest improvement in prediction of type 2 diabetes incidence over and above the clinical model was achieved by the top ten features across several omic layers (C index=0.87 [95% CI 0.82, 0.92], Δ C index=0.05, p=0.045). This improvement by the top ten omic features was also evident in individuals with HbA1c <42 mmol/mol (6.0%), the threshold for prediabetes (C index=0.84 [95% CI 0.77, 0.90], Δ C index=0.07, p=0.03), the group in whom prediction would be most useful since they are not targeted for preventative interventions by current clinical guidelines. In this subgroup, the type 2 diabetes polygenic risk score was the major contributor to the improvement in prediction, and achieved a comparable improvement in performance when added onto the clinical model alone (C index=0.83 [95% CI 0.75, 0.90], Δ C index=0.06, p=0.002). However, compared with those with prediabetes, individuals at high polygenic risk in this group had only around half the absolute risk for type 2 diabetes over a 20 year period. CONCLUSIONS/INTERPRETATION Omic approaches provided marginal improvements in prediction of incident type 2 diabetes. However, while a polygenic risk score does improve prediction in people with an HbA1c in the normoglycaemic range, the group in whom prediction would be most useful, even individuals with a high polygenic burden in that subgroup had a low absolute type 2 diabetes risk. This suggests a limited feasibility of implementing targeted population-based genetic screening for preventative interventions.
Collapse
Affiliation(s)
- Julia Carrasco-Zanini
- MRC Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Institute of Metabolic Science, Cambridge, UK
- Computational Medicine, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Precision Healthcare University Research Institute, Queen Mary University of London, London, UK
| | - Maik Pietzner
- MRC Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Institute of Metabolic Science, Cambridge, UK
- Computational Medicine, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Precision Healthcare University Research Institute, Queen Mary University of London, London, UK
| | - Eleanor Wheeler
- MRC Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Institute of Metabolic Science, Cambridge, UK
| | - Nicola D Kerrison
- MRC Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Institute of Metabolic Science, Cambridge, UK
| | - Claudia Langenberg
- MRC Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Institute of Metabolic Science, Cambridge, UK.
- Computational Medicine, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany.
- Precision Healthcare University Research Institute, Queen Mary University of London, London, UK.
| | - Nicholas J Wareham
- MRC Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Institute of Metabolic Science, Cambridge, UK.
| |
Collapse
|
35
|
Leary JR, Bacher R. Interpretable trajectory inference with single-cell Linear Adaptive Negative-binomial Expression (scLANE) testing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.19.572477. [PMID: 38187622 PMCID: PMC10769309 DOI: 10.1101/2023.12.19.572477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
The rapid proliferation of trajectory inference methods for single-cell RNA-seq data has allowed researchers to investigate complex biological processes by examining underlying gene expression dynamics. After estimating a latent cell ordering, statistical models are used to determine which genes exhibit changes in expression that are significantly associated with progression through the biological trajectory. While a few techniques for performing trajectory differential expression exist, most rely on the flexibility of generalized additive models in order to account for the inherent nonlinearity of changes in gene expression. As such, the results can be difficult to interpret, and biological conclusions often rest on subjective visual inspections of the most dynamic genes. To address this challenge, we propose scLANE testing, which is built around an interpretable generalized linear model and handles nonlinearity with basis splines chosen empirically for each gene. In addition, extensions to estimating equations and mixed models allow for reliable trajectory testing under complex experimental designs. After validating the accuracy of scLANE under several different simulation scenarios, we apply it to a set of diverse biological datasets and display its ability to provide novel biological information when used downstream of both pseudotime and RNA velocity estimation methods.
Collapse
Affiliation(s)
- Jack R. Leary
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA
| | - Rhonda Bacher
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA
| |
Collapse
|
36
|
Privé F, Albiñana C, Arbel J, Pasaniuc B, Vilhjálmsson BJ. Inferring disease architecture and predictive ability with LDpred2-auto. Am J Hum Genet 2023; 110:2042-2055. [PMID: 37944514 PMCID: PMC10716363 DOI: 10.1016/j.ajhg.2023.10.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/15/2023] [Accepted: 10/17/2023] [Indexed: 11/12/2023] Open
Abstract
LDpred2 is a widely used Bayesian method for building polygenic scores (PGSs). LDpred2-auto can infer the two parameters from the LDpred model, the SNP heritability h2 and polygenicity p, so that it does not require an additional validation dataset to choose best-performing parameters. The main aim of this paper is to properly validate the use of LDpred2-auto for inferring multiple genetic parameters. Here, we present a new version of LDpred2-auto that adds an optional third parameter α to its model, for modeling negative selection. We then validate the inference of these three parameters (or two, when using the previous model). We also show that LDpred2-auto provides per-variant probabilities of being causal that are well calibrated and can therefore be used for fine-mapping purposes. We also introduce a formula to infer the out-of-sample predictive performance r2 of the resulting PGS directly from the Gibbs sampler of LDpred2-auto. Finally, we extend the set of HapMap3 variants recommended to use with LDpred2 with 37% more variants to improve the coverage of this set, and we show that this new set of variants captures 12% more heritability and provides 6% more predictive performance, on average, in UK Biobank analyses.
Collapse
Affiliation(s)
- Florian Privé
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark.
| | - Clara Albiñana
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
| | - Julyan Arbel
- University Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, France
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bjarni J Vilhjálmsson
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark; Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark; Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| |
Collapse
|
37
|
Bhattacharya A, Vo DD, Jops C, Kim M, Wen C, Hervoso JL, Pasaniuc B, Gandal MJ. Isoform-level transcriptome-wide association uncovers genetic risk mechanisms for neuropsychiatric disorders in the human brain. Nat Genet 2023; 55:2117-2128. [PMID: 38036788 PMCID: PMC10703692 DOI: 10.1038/s41588-023-01560-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 10/05/2023] [Indexed: 12/02/2023]
Abstract
Methods integrating genetics with transcriptomic reference panels prioritize risk genes and mechanisms at only a fraction of trait-associated genetic loci, due in part to an overreliance on total gene expression as a molecular outcome measure. This challenge is particularly relevant for the brain, in which extensive splicing generates multiple distinct transcript-isoforms per gene. Due to complex correlation structures, isoform-level modeling from cis-window variants requires methodological innovation. Here we introduce isoTWAS, a multivariate, stepwise framework integrating genetics, isoform-level expression and phenotypic associations. Compared to gene-level methods, isoTWAS improves both isoform and gene expression prediction, yielding more testable genes, and increased power for discovery of trait associations within genome-wide association study loci across 15 neuropsychiatric traits. We illustrate multiple isoTWAS associations undetectable at the gene-level, prioritizing isoforms of AKT3, CUL3 and HSPD1 in schizophrenia and PCLO with multiple disorders. Results highlight the importance of incorporating isoform-level resolution within integrative approaches to increase discovery of trait associations, especially for brain-relevant traits.
Collapse
Affiliation(s)
- Arjun Bhattacharya
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, USA.
- Institute for Data Science in Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
| | - Daniel D Vo
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Lifespan Brain Institute at Penn Med and the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Connor Jops
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Lifespan Brain Institute at Penn Med and the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Minsoo Kim
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Cindy Wen
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, USA
| | - Jonatan L Hervoso
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Michael J Gandal
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Lifespan Brain Institute at Penn Med and the Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
38
|
Harmata GIS, Barsotti EJ, Casten LG, Fiedorowicz JG, Williams A, Shaffer JJ, Richards JG, Sathyaputri L, Schmitz SL, Christensen GE, Long JD, Gaine ME, Xu J, Michaelson JJ, Wemmie JA, Magnotta VA. Cerebellar morphological differences and associations with extrinsic factors in bipolar disorder type I. J Affect Disord 2023; 340:269-279. [PMID: 37562560 PMCID: PMC10529949 DOI: 10.1016/j.jad.2023.08.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 07/18/2023] [Accepted: 08/03/2023] [Indexed: 08/12/2023]
Abstract
BACKGROUND The neural underpinnings of bipolar disorder (BD) remain poorly understood. The cerebellum is ideally positioned to modulate emotional regulation circuitry yet has been understudied in BD. Literature suggests differences in cerebellar activity and metabolism in BD, however findings on structural differences remain contradictory. Potential reasons include combining BD subtypes, small sample sizes, and potential moderators such as genetics, adverse childhood experiences (ACEs), and pharmacotherapy. METHODS We collected 3 T MRI scans from participants with (N = 131) and without (N = 81) BD type I, as well as blood and questionnaires. We assessed differences in cerebellar volumes and explored potentially influential factors. RESULTS The cerebellar cortex was smaller bilaterally in participants with BD. Polygenic propensity score did not predict any cerebellar volumes, suggesting that non-genetic factors may have greater influence on the cerebellar volume difference we observed in BD. Proportionate cerebellar white matter volumes appeared larger with more ACEs, but this may result from reduced ICV. Time from onset and symptom burden were not associated with cerebellar volumes. Finally, taking sedatives was associated with larger cerebellar white matter and non-significantly larger cortical volume. LIMITATIONS This study was cross-sectional, limiting interpretation of possible mechanisms. Most of our participants were White, which could limit the generalizability. Additionally, we did not account for potential polypharmacy interactions. CONCLUSIONS These findings suggest that external factors, such as sedatives and childhood experiences, may influence cerebellum structure in BD and may mask underlying differences. Accounting for such variables may be critical for consistent findings in future studies.
Collapse
Affiliation(s)
- Gail I S Harmata
- Department of Psychiatry, The University of Iowa, United States; Iowa Neuroscience Institute, The University of Iowa, United States; Department of Radiology, The University of Iowa, United States
| | - Ercole John Barsotti
- Department of Psychiatry, The University of Iowa, United States; Department of Epidemiology, The University of Iowa, United States
| | - Lucas G Casten
- Department of Psychiatry, The University of Iowa, United States; Interdisciplinary Graduate Program in Genetics, The University of Iowa, United States
| | - Jess G Fiedorowicz
- Department of Psychiatry, The University of Iowa, United States; Iowa Neuroscience Institute, The University of Iowa, United States; Department of Psychiatry, University of Ottawa, Canada
| | - Aislinn Williams
- Department of Psychiatry, The University of Iowa, United States; Iowa Neuroscience Institute, The University of Iowa, United States
| | - Joseph J Shaffer
- Department of Psychiatry, The University of Iowa, United States; Iowa Neuroscience Institute, The University of Iowa, United States; Department of Radiology, The University of Iowa, United States; Department of Biosciences, Kansas City University, United States
| | | | | | | | - Gary E Christensen
- Department of Electrical and Computer Engineering, The University of Iowa, United States; Department of Radiation Oncology, The University of Iowa, United States
| | - Jeffrey D Long
- Department of Psychiatry, The University of Iowa, United States; Department of Biostatistics, The University of Iowa, United States
| | - Marie E Gaine
- Department of Psychiatry, The University of Iowa, United States; Iowa Neuroscience Institute, The University of Iowa, United States; Department of Pharmaceutical Sciences and Experimental Therapeutics (PSET), College of Pharmacy, The University of Iowa, United States
| | - Jia Xu
- Department of Radiology, The University of Iowa, United States
| | - Jake J Michaelson
- Department of Psychiatry, The University of Iowa, United States; Iowa Neuroscience Institute, The University of Iowa, United States; Interdisciplinary Graduate Program in Genetics, The University of Iowa, United States
| | - John A Wemmie
- Department of Psychiatry, The University of Iowa, United States; Iowa Neuroscience Institute, The University of Iowa, United States; Department of Molecular Physiology and Biophysics, The University of Iowa, United States; Department of Neurosurgery, The University of Iowa, United States; Veterans Affairs Medical Center, Iowa City, United States
| | - Vincent A Magnotta
- Department of Psychiatry, The University of Iowa, United States; Iowa Neuroscience Institute, The University of Iowa, United States; Department of Radiology, The University of Iowa, United States; Department of Biomedical Engineering, The University of Iowa, United States.
| |
Collapse
|
39
|
Bjornsdottir G, Chalmer MA, Stefansdottir L, Skuladottir AT, Einarsson G, Andresdottir M, Beyter D, Ferkingstad E, Gretarsdottir S, Halldorsson BV, Halldorsson GH, Helgadottir A, Helgason H, Hjorleifsson Eldjarn G, Jonasdottir A, Jonasdottir A, Jonsdottir I, Knowlton KU, Nadauld LD, Lund SH, Magnusson OT, Melsted P, Moore KHS, Oddsson A, Olason PI, Sigurdsson A, Stefansson OA, Saemundsdottir J, Sveinbjornsson G, Tragante V, Unnsteinsdottir U, Walters GB, Zink F, Rødevand L, Andreassen OA, Igland J, Lie RT, Haavik J, Banasik K, Brunak S, Didriksen M, T Bruun M, Erikstrup C, Kogelman LJA, Nielsen KR, Sørensen E, Pedersen OB, Ullum H, Masson G, Thorsteinsdottir U, Olesen J, Ludvigsson P, Thorarensen O, Bjornsdottir A, Sigurdardottir GR, Sveinsson OA, Ostrowski SR, Holm H, Gudbjartsson DF, Thorleifsson G, Sulem P, Stefansson H, Thorgeirsson TE, Hansen TF, Stefansson K. Rare variants with large effects provide functional insights into the pathology of migraine subtypes, with and without aura. Nat Genet 2023; 55:1843-1853. [PMID: 37884687 PMCID: PMC10632135 DOI: 10.1038/s41588-023-01538-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 09/18/2023] [Indexed: 10/28/2023]
Abstract
Migraine is a complex neurovascular disease with a range of severity and symptoms, yet mostly studied as one phenotype in genome-wide association studies (GWAS). Here we combine large GWAS datasets from six European populations to study the main migraine subtypes, migraine with aura (MA) and migraine without aura (MO). We identified four new MA-associated variants (in PRRT2, PALMD, ABO and LRRK2) and classified 13 MO-associated variants. Rare variants with large effects highlight three genes. A rare frameshift variant in brain-expressed PRRT2 confers large risk of MA and epilepsy, but not MO. A burden test of rare loss-of-function variants in SCN11A, encoding a neuron-expressed sodium channel with a key role in pain sensation, shows strong protection against migraine. Finally, a rare variant with cis-regulatory effects on KCNK5 confers large protection against migraine and brain aneurysms. Our findings offer new insights with therapeutic potential into the complex biology of migraine and its subtypes.
Collapse
Affiliation(s)
| | - Mona A Chalmer
- Danish Headache Center, Department of Neurology, Copenhagen University Hospital, Rigshospitalet-Glostrup, Copenhagen, Denmark
| | | | | | | | | | | | | | | | - Bjarni V Halldorsson
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
- Reykjavik University, School of Technology, Reykjavik, Iceland
| | - Gisli H Halldorsson
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Hannes Helgason
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | | | - Ingileif Jonsdottir
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Sigrun H Lund
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
- Faculty of Physical Sciences, School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Pall Melsted
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | | | | | | | | | | | | | | | | | | | - Linn Rødevand
- NORMENT, Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Ole A Andreassen
- NORMENT, Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Jannicke Igland
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
- Department of Health and Social Science, Centre for Evidence-Based Practice, Western Norway University of Applied Science, Bergen, Norway
| | - Rolv T Lie
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Jan Haavik
- Department of Biomedicine, University of Bergen, Bergen, Norway
- Division of Psychiatry, Haukeland University Hospital, Bergen, Norway
| | - Karina Banasik
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Maria Didriksen
- Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Mie T Bruun
- Department of Clinical Immunology, Odense University Hospital, Odense, Denmark
| | - Christian Erikstrup
- Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark
- Department of Clinical Medicine Health, Aarhus University, Aarhus, Denmark
| | - Lisette J A Kogelman
- Danish Headache Center, Department of Neurology, Copenhagen University Hospital, Rigshospitalet-Glostrup, Copenhagen, Denmark
| | - Kaspar R Nielsen
- Department of Clinical Immunology, Aalborg University Hospital, Aalborg, Denmark
- Department of Clinical Medicine, Aalborg University, Aalborg, Denmark
| | - Erik Sørensen
- Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Ole B Pedersen
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Unnur Thorsteinsdottir
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Jes Olesen
- Danish Headache Center, Department of Neurology, Copenhagen University Hospital, Rigshospitalet-Glostrup, Copenhagen, Denmark
| | - Petur Ludvigsson
- Department of Pediatrics, Landspitali University Hostpital, Reykjavik, Iceland
| | - Olafur Thorarensen
- Department of Pediatrics, Landspitali University Hostpital, Reykjavik, Iceland
| | | | | | - Olafur A Sveinsson
- Laeknasetrid Clinic, Reykjavik, Iceland
- Department of Neurology, Landspitali University Hospital, Reykjavik, Iceland
| | - Sisse R Ostrowski
- Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Hilma Holm
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
| | - Daniel F Gudbjartsson
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | | | | | - Thomas F Hansen
- Danish Headache Center, Department of Neurology, Copenhagen University Hospital, Rigshospitalet-Glostrup, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kari Stefansson
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland.
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland.
| |
Collapse
|
40
|
Jeng XJ, Hu Y, Venkat V, Lu TP, Tzeng JY. Transfer learning with false negative control improves polygenic risk prediction. PLoS Genet 2023; 19:e1010597. [PMID: 38011285 PMCID: PMC10723713 DOI: 10.1371/journal.pgen.1010597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Revised: 12/15/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
Polygenic risk score (PRS) is a quantity that aggregates the effects of variants across the genome and estimates an individual's genetic predisposition for a given trait. PRS analysis typically contains two input data sets: base data for effect size estimation and target data for individual-level prediction. Given the availability of large-scale base data, it becomes more common that the ancestral background of base and target data do not perfectly match. In this paper, we treat the GWAS summary information obtained in the base data as knowledge learned from a pre-trained model, and adopt a transfer learning framework to effectively leverage the knowledge learned from the base data that may or may not have similar ancestral background as the target samples to build prediction models for target individuals. Our proposed transfer learning framework consists of two main steps: (1) conducting false negative control (FNC) marginal screening to extract useful knowledge from the base data; and (2) performing joint model training to integrate the knowledge extracted from base data with the target training data for accurate trans-data prediction. This new approach can significantly enhance the computational and statistical efficiency of joint-model training, alleviate over-fitting, and facilitate more accurate trans-data prediction when heterogeneity level between target and base data sets is small or high.
Collapse
Affiliation(s)
- Xinge Jessie Jeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Yifei Hu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Vaishnavi Venkat
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Tzu-Pin Lu
- Institute of Health Data Analytics and Statistics, National Taiwan University, Taipei, Taiwan
- Department of Public Health, National Taiwan University, Taipei, Taiwan
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Institute of Health Data Analytics and Statistics, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
41
|
Zhu X, Yang Y, Lorincz-Comi N, Li G, Bentley A, de Vries PS, Brown M, Morrison AC, Rotimi C, James Gauderman W, Rao DC, Aschard H. A new Approach to Identify Gene-Environment Interactions and Reveal New Biological Insight in Complex traits. RESEARCH SQUARE 2023:rs.3.rs-3338723. [PMID: 37886448 PMCID: PMC10602131 DOI: 10.21203/rs.3.rs-3338723/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
There is a long-standing debate about the magnitude of the contribution of gene-environment interactions to phenotypic variations of complex traits owing to the low statistical power and few reported interactions to date. To address this issue, the CHARGE Gene-Lifestyle Interactions Working Group has been spearheading efforts to investigate G × E in large and diverse samples through meta-analysis. Here, we present a powerful new approach to screen for interactions across the genome, an approach that shares substantial similarity to the Mendelian randomization framework. We identified and confirmed 5 loci (6 independent signals) interacting with either cigarette smoking or alcohol consumption for serum lipids, and empirically demonstrated that interaction and mediation are the major contributors to genetic effect size heterogeneity across populations. The estimated lower bound of the interaction and environmentally mediated contribution ranges from 1.76% to 14.05% of SNP heritability of serum lipids in Cross-Population data. Our study improves the understanding of the genetic architecture and environmental contributions to complex traits.
Collapse
Affiliation(s)
- Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA
| | - Yihe Yang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA
| | - Noah Lorincz-Comi
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA
| | - Gen Li
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA
| | - Amy Bentley
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Paul S de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Michael Brown
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Charles Rotimi
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - W. James Gauderman
- Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - DC Rao
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO, USA
| | - Hugues Aschard
- Institut Pasteur, Université Paris Cité, Department of Computational Biology, F-75015 Paris, France
| |
Collapse
|
42
|
Thomas M, Su YR, Rosenthal EA, Sakoda LC, Schmit SL, Timofeeva MN, Chen Z, Fernandez-Rozadilla C, Law PJ, Murphy N, Carreras-Torres R, Diez-Obrero V, van Duijnhoven FJB, Jiang S, Shin A, Wolk A, Phipps AI, Burnett-Hartman A, Gsur A, Chan AT, Zauber AG, Wu AH, Lindblom A, Um CY, Tangen CM, Gignoux C, Newton C, Haiman CA, Qu C, Bishop DT, Buchanan DD, Crosslin DR, Conti DV, Kim DH, Hauser E, White E, Siegel E, Schumacher FR, Rennert G, Giles GG, Hampel H, Brenner H, Oze I, Oh JH, Lee JK, Schneider JL, Chang-Claude J, Kim J, Huyghe JR, Zheng J, Hampe J, Greenson J, Hopper JL, Palmer JR, Visvanathan K, Matsuo K, Matsuda K, Jung KJ, Li L, Le Marchand L, Vodickova L, Bujanda L, Gunter MJ, Matejcic M, Jenkins MA, Slattery ML, D'Amato M, Wang M, Hoffmeister M, Woods MO, Kim M, Song M, Iwasaki M, Du M, Udaltsova N, Sawada N, Vodicka P, Campbell PT, Newcomb PA, Cai Q, Pearlman R, Pai RK, Schoen RE, Steinfelder RS, Haile RW, Vandenputtelaar R, Prentice RL, Küry S, Castellví-Bel S, Tsugane S, Berndt SI, Lee SC, Brezina S, Weinstein SJ, Chanock SJ, Jee SH, Kweon SS, Vadaparampil S, Harrison TA, Yamaji T, Keku TO, Vymetalkova V, Arndt V, Jia WH, Shu XO, Lin Y, Ahn YO, Stadler ZK, Van Guelpen B, Ulrich CM, Platz EA, Potter JD, Li CI, Meester R, Moreno V, Figueiredo JC, Casey G, Lansdorp Vogelaar I, Dunlop MG, Gruber SB, Hayes RB, Pharoah PDP, Houlston RS, Jarvik GP, Tomlinson IP, Zheng W, Corley DA, Peters U, Hsu L. Combining Asian and European genome-wide association studies of colorectal cancer improves risk prediction across racial and ethnic populations. Nat Commun 2023; 14:6147. [PMID: 37783704 PMCID: PMC10545678 DOI: 10.1038/s41467-023-41819-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 09/19/2023] [Indexed: 10/04/2023] Open
Abstract
Polygenic risk scores (PRS) have great potential to guide precision colorectal cancer (CRC) prevention by identifying those at higher risk to undertake targeted screening. However, current PRS using European ancestry data have sub-optimal performance in non-European ancestry populations, limiting their utility among these populations. Towards addressing this deficiency, we expand PRS development for CRC by incorporating Asian ancestry data (21,731 cases; 47,444 controls) into European ancestry training datasets (78,473 cases; 107,143 controls). The AUC estimates (95% CI) of PRS are 0.63(0.62-0.64), 0.59(0.57-0.61), 0.62(0.60-0.63), and 0.65(0.63-0.66) in independent datasets including 1681-3651 cases and 8696-115,105 controls of Asian, Black/African American, Latinx/Hispanic, and non-Hispanic White, respectively. They are significantly better than the European-centric PRS in all four major US racial and ethnic groups (p-values < 0.05). Further inclusion of non-European ancestry populations, especially Black/African American and Latinx/Hispanic, is needed to improve the risk prediction and enhance equity in applying PRS in clinical practice.
Collapse
Affiliation(s)
- Minta Thomas
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Yu-Ru Su
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, USA
| | - Elisabeth A Rosenthal
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, 98195, USA
| | - Lori C Sakoda
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Stephanie L Schmit
- Genomic Medicine Institute, Cleveland Clinic, Cleveland, OH, USA
- Population and Cancer Prevention Program, Case Comprehensive Cancer Center, Cleveland, USA
| | - Maria N Timofeeva
- Danish Institute for Advanced Study (DIAS), Epidemiology, Biostatistics and Biodemography, Department of Public Health, University of Southern Denmark, Odense, Denmark
- Colon Cancer Genetics Group, Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, U, Germany
| | - Zhishan Chen
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ceres Fernandez-Rozadilla
- Instituto de Investigacion Sanitaria de Santiago (IDIS), Choupana sn, 15706, Santiago de Compostela, Spain
- Edinburgh Cancer Research Centre, Institute of Genomics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Philip J Law
- Division of Genetics and Epidemiology, The Institute of Cancer Reseach, London, SW7 3RP, UK
| | - Neil Murphy
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, World Health Organization, Lyon, France
| | - Robert Carreras-Torres
- Digestive Diseases and Microbiota Group, Girona Biomedical Research Institute (IDIBGI), Salt, 17190, Girona, Spain
| | - Virginia Diez-Obrero
- Unit of Biomarkers and Susceptibility, Oncology Data Analytics Program, Catalan Institute of Oncology, Barcelona, 08908, Spain
- Colorectal Cancer Group, ONCOBELL Program, Bellvitge Biomedical Research Institute, Barcelona, 08908, Spain
- Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, 08908, Spain
| | | | - Shangqing Jiang
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Aesun Shin
- Department of Preventive Medicine, Seoul National University College of Medicine, Seoul National University Cancer Research Institute, Seoul, South Korea
| | - Alicja Wolk
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Amanda I Phipps
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | | | - Andrea Gsur
- .Center for Cancer Research, Medical University Vienna, Vienna, Austria
| | - Andrew T Chan
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Ann G Zauber
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Anna H Wu
- University of Southern California, Preventative Medicine, Los Angeles, CA, USA
| | - Annika Lindblom
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Caroline Y Um
- Department of Population Science, American Cancer Society, Atlanta, GA, USA
| | - Catherine M Tangen
- SWOG Statistical Center, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Chris Gignoux
- Colorado Center for Personalized Medicine, University of Colorado - Anschutz Medical Campus, Aurora, CO, USA
| | - Christina Newton
- Department of Population Science, American Cancer Society, Atlanta, GA, USA
| | - Christopher A Haiman
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Conghui Qu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - D Timothy Bishop
- Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
| | - Daniel D Buchanan
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville, VIC, 3000, Australia
- University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, Parkville, VIC, 3000, Australia
- Genomic Medicine and Family Cancer Clinic, The Royal Melbourne Hospital, Parkville, VIC, 3000, Australia
| | - David R Crosslin
- Department of Bioinformatics and Medical Education, University of Washington Medical Center, Seattle, WA, 98195, USA
| | - David V Conti
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Dong-Hyun Kim
- Department of Social and Preventive Medicine, Hallym University College of Medicine, Okcheon-dong, South Korea
| | - Elizabeth Hauser
- VA Cooperative Studies Program Epidemiology Center, Durham Veterans Affairs Health Care System, Durham, NC, USA
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC, USA
| | - Emily White
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Department of Epidemiology, University of Washington School of Public Health, Seattle, WA, USA
| | - Erin Siegel
- Cancer Epidemiology Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Fredrick R Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Gad Rennert
- Department of Community Medicine and Epidemiology, Lady Davis Carmel Medical Center, Haifa, Israel
- Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel
| | - Graham G Giles
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, VIC, Australia
| | - Heather Hampel
- Division of Human Genetics, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Isao Oze
- .Division of Cancer Epidemiology and Prevention, Aichi Cancer Center Research Institute, Nagoya, Japan
| | - Jae Hwan Oh
- .Research Institute and Hospital, National Cancer Center, Goyang, South Korea, South Korea
| | - Jeffrey K Lee
- .Department of Gastroenterology, Kaiser Permanente San Francisco Medical Center, San Francisco, CA, USA
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48104, USA
| | | | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- University Medical Centre Hamburg-Eppendorf, University Cancer Centre Hamburg (UCCH), Hamburg, Germany
| | - Jeongseon Kim
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Gyeonggi-do, South Korea
| | - Jeroen R Huyghe
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jiayin Zheng
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jochen Hampe
- Department of Medicine I, University Hospital Dresden, Technische Universität Dresden (TU Dresden), Dresden, Germany
| | - Joel Greenson
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48104, USA
| | - John L Hopper
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC, Australia
- Department of Epidemiology, School of Public Health and Institute of Health and Environment, Seoul National University, Seoul, South Korea
| | - Julie R Palmer
- Slone Epidemiology Center, School of Medicine, Boston University, Boston, MA, USA
| | - Kala Visvanathan
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Keitaro Matsuo
- Division of Molecular and Clinical Epidemiology, Aichi Cancer Center Research Institute, Nagoya, Japan
| | - Koichi Matsuda
- Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Keum Ji Jung
- Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Korea
| | - Li Li
- Department of Family Medicine, University of Virginia, Charlottesville, VA, USA
| | | | - Ludmila Vodickova
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, Prague, Czech Republic
- Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, Prague, Czech Republic
- Faculty of Medicine and Biomedical Center in Pilsen, Charles University, Pilsen, Czech Republic
| | - Luis Bujanda
- Department of Gastroenterology, Biodonostia Health Research Institute, Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Universidad del País Vasco (UPV/EHU), San Sebastián, Spain
| | - Marc J Gunter
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, World Health Organization, Lyon, France
| | | | - Mark A Jenkins
- University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, Parkville, VIC, 3000, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Martha L Slattery
- Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA
| | - Mauro D'Amato
- Department of Medicine and Surgery, LUM University, Camassima, Italy
- Gastrointestinal Genetics Lab, CIC bioGUNE-BRTA, Derio, Spain
| | - Meilin Wang
- Department of Environmental Genomics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Michael O Woods
- Memorial University of Newfoundland, Discipline of Genetics, St. John's, Canada
| | - Michelle Kim
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Mingyang Song
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Departments of Epidemiology and Nutrition, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Motoki Iwasaki
- Division of Epidemiology, National Cancer Center Institute for Cancer Control, National Cancer Center, Tokyo, Japan
- Division of Cohort Research, National Cancer Center Institute for Cancer Control, National Cancer Center, Tokyo, Japan
| | - Mulong Du
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Natalia Udaltsova
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Norie Sawada
- Division of Cohort Research, National Cancer Center Institute for Cancer Control, National Cancer Center, Tokyo, Japan
| | - Pavel Vodicka
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, Prague, Czech Republic
- Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, Prague, Czech Republic
| | - Peter T Campbell
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Polly A Newcomb
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Qiuyin Cai
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Rachel Pearlman
- Division of Human Genetics, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Rish K Pai
- Department of Laboratory Medicine and Pathology, Mayo Clinic Arizona, Scottsdale, AZ, USA
| | - Robert E Schoen
- Department of Medicine and Epidemiology, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Robert S Steinfelder
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Robert W Haile
- Samuel Oschin Comprehensive Cancer Institute, CEDARS-SINAI, Los Angeles, CA, USA
| | - Rosita Vandenputtelaar
- Department of Public Health, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Ross L Prentice
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Sébastien Küry
- Nantes Université, CHU Nantes, Service de Génétique Médicale, F-44000, Nantes, France
| | - Sergi Castellví-Bel
- Gastroenterology Department, Hospital Clínic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), University of Barcelona, Barcelona, Spain
| | - Shoichiro Tsugane
- Division of Cohort Research, National Cancer Center Institute for Cancer Control, National Cancer Center, Tokyo, Japan
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Soo Chin Lee
- National University Cancer Institute, Singapore, Singapore
| | - Stefanie Brezina
- .Center for Cancer Research, Medical University Vienna, Vienna, Austria
| | - Stephanie J Weinstein
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sun Ha Jee
- Department of Epidemiology and Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Korea
| | - Sun-Seog Kweon
- Department of Preventive Medicine, Chonnam National University Medical School, Gwangju, Korea
- Jeonnam Regional Cancer Center, Chonnam National University Hwasun Hospital, Hwasun, Korea
| | - Susan Vadaparampil
- Departments of Epidemiology and Nutrition, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Tabitha A Harrison
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Taiki Yamaji
- Division of Epidemiology, National Cancer Center Institute for Cancer Control, National Cancer Center, Tokyo, Japan
| | - Temitope O Keku
- Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, NC, USA
| | - Veronika Vymetalkova
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, Prague, Czech Republic
- Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, Prague, Czech Republic
| | - Volker Arndt
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Wei-Hua Jia
- State Key Laboratory of Oncology in South China, Cancer Center, Sun Yat-sen University, Guangzhou, China
| | - Xiao-Ou Shu
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Yi Lin
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Yoon-Ok Ahn
- Department of Preventive Medicine, Seoul National University College of Medicine, Seoul National University Cancer Research Institute, Seoul, South Korea
| | - Zsofia K Stadler
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Bethany Van Guelpen
- Department of Radiation Sciences, Oncology Unit, Umeå University, Umeå, Sweden
- Wallenberg Centre for Molecular Medicine, Umeå University, Umeå, Sweden
| | - Cornelia M Ulrich
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT, USA
| | - Elizabeth A Platz
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - John D Potter
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Christopher I Li
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Reinier Meester
- Department of Public Health, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Victor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology-IDIBELL, L'Hospitalet de Llobregat, Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Madrid, Spain
- Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
- ONCOBEL Program, Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Spain
| | - Jane C Figueiredo
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Medicine, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Graham Casey
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Iris Lansdorp Vogelaar
- Department of Public Health, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Malcolm G Dunlop
- Colon Cancer Genetics Group, Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, U, Germany
| | - Stephen B Gruber
- Department of Medical Oncology & Therapeutics Research, City of Hope National Medical Center, Duarte, CA, USA
| | - Richard B Hayes
- Division of Epidemiology, Department of Population Health, New York University School of Medicine, New York, NY, USA
| | - Paul D P Pharoah
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Richard S Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Reseach, London, SW7 3RP, UK
| | - Gail P Jarvik
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, 98195, USA
| | - Ian P Tomlinson
- Edinburgh Cancer Research Centre, Institute of Genomics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Wei Zheng
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Douglas A Corley
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
- Department of Gastroenterology, Kaiser Permanente Medical Center, San Francisco, CA, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA.
- Department of Epidemiology, University of Washington, Seattle, WA, USA.
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA.
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
43
|
Lee DSM, Cardone KM, Zhang DY, Abramowitz S, DePaolo JS, Aragam KG, Biddinger K, Conery M, Dilitikas O, Hoffman-Andrews L, Judy RL, Khan A, Kulo I, Puckelwartz MJ, Reza N, Satterfield BA, Singhal P, Arany ZP, Cappola TP, Carruth E, Day SM, Do R, Haggarty CM, Joseph J, McNally E, Nadkarni G, Owens AT, Rader DJ, Ritchie MD, Sun Y, Voight BF, Levin MG, Damrauer SM. Common- and rare-variant genetic architecture of heart failure across the allele frequency spectrum. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.16.23292724. [PMID: 37503172 PMCID: PMC10371173 DOI: 10.1101/2023.07.16.23292724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Heart failure (HF) is a complex trait, influenced by environmental and genetic factors, that affects over 30 million individuals worldwide. Historically, the genetics of HF have been studied in Mendelian forms of disease, where rare genetic variants have been linked to familial cardiomyopathies. More recently, genome-wide association studies (GWAS) have successfully identified common genetic variants associated with risk of HF. However, the relative importance of genetic variants across the allele-frequency spectrum remains incompletely characterized. Here, we report the results of common- and rare-variant association studies of all-cause heart failure, applying recently developed methods to quantify the heritability of HF attributable to different classes of genetic variation. We combine GWAS data across multiple populations including 207,346 individuals with HF and 2,151,210 without, identifying 176 risk loci at genome-wide significance (p < 5×10-8). Signals at newly identified common-variant loci include coding variants in Mendelian cardiomyopathy genes (MYBPC3, BAG3), as well as regulators of lipoprotein (LPL) and glucose metabolism (GIPR, GLP1R), and are enriched in cardiac, muscle, nerve, and vascular tissues, as well as myocyte and adipocyte cell types. Gene burden studies across three biobanks (PMBB, UKB, AOU) including 27,208 individuals with HF and 349,126 without uncover exome-wide significant (p < 3.15×10-6) associations for HF and rare predicted loss-of-function (pLoF) variants in TTN, MYBPC3, FLNC, and BAG3. Total burden heritability of rare coding variants (2.2%, 95% CI 0.99-3.5%) is highly concentrated in a small set of Mendelian cardiomyopathy genes, and is lower than heritability attributable to common variants (4.3%, 95% CI 3.9-4.7%) which is more diffusely spread throughout the genome. Finally, we demonstrate that common-variant background, in the form of a polygenic risk score (PRS), significantly modifies the risk of HF among carriers of pathogenic truncating variants in the Mendelian cardiomyopathy gene TTN. These findings suggest a significant polygenic component to HF exists that is not captured by current clinical genetic testing.
Collapse
Affiliation(s)
- David S M Lee
- Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Katie M Cardone
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - David Y Zhang
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Sarah Abramowitz
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - John S DePaolo
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Krishna G Aragam
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Kiran Biddinger
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Mitchell Conery
- Genomics and Computational Biology Graduate Group, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Ozan Dilitikas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN
| | - Lily Hoffman-Andrews
- Division of Cardiovascular Medicine, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Renae L Judy
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Atlas Khan
- Division of Nephrology, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY
| | - Iftikhar Kulo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN
| | - Megan J Puckelwartz
- Department of Pharmacology, Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Nosheen Reza
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN
| | | | - Pankhuri Singhal
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Zoltan P Arany
- Division of Cardiovascular Medicine, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
- Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Thomas P Cappola
- Division of Cardiovascular Medicine, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Eric Carruth
- Department of Translational Data Science and Informatics, Geisinger, Danville, PA
| | - Sharlene M Day
- Division of Cardiovascular Medicine, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
- Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, Mount Sinai Icahn School of Medicine, New York, NY
- Biome Phenomics Center, Mount Sinai Icahn School of Medicine, New York, NY
- Department of Genetics and Genomic Sciences, Mount Sinai Icahn School of Medicine, New York, NY
| | | | - Jacob Joseph
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, MA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Elizabeth McNally
- Center for Genetic Medicine, Bluhm Cardiovascular Institute, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Girish Nadkarni
- Division of Nephrology, Department of Medicine, Mount Sinai Icahn School of Medicine, New York, NY
| | - Anjali T Owens
- Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Daniel J Rader
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
- Division of Translational Medicine and Human Genetics, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Yan Sun
- Deparment of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA
| | - Benjamin F Voight
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA
| | - Michael G Levin
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA
| | - Scott M Damrauer
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
- Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA
| |
Collapse
|
44
|
Chen T, Zhang H, Mazumder R, Lin X. Ensembled best subset selection using summary statistics for polygenic risk prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.25.559307. [PMID: 37886515 PMCID: PMC10602024 DOI: 10.1101/2023.09.25.559307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, yet existing methods face a tradeoff between predictive power and computational efficiency. We introduce ALL-Sum, a fast and scalable PRS method that combines an efficient summary statistic-based L 0 L 2 penalized regression algorithm with an ensembling step that aggregates estimates from different tuning parameters for improved prediction performance. In extensive large-scale simulations across a wide range of polygenicity and genome-wide association studies (GWAS) sample sizes, ALL-Sum consistently outperforms popular alternative methods in terms of prediction accuracy, runtime, and memory usage. We analyze 27 published GWAS summary statistics for 11 complex traits from 9 reputable data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen, evaluated using individual-level UKBB data. ALL-Sum achieves the highest accuracy for most traits, particularly for GWAS with large sample sizes. We provide ALL-Sum as a user-friendly command-line software with pre-computed reference data for streamlined user-end analysis.
Collapse
|
45
|
Pedersen EM, Agerbo E, Plana-Ripoll O, Steinbach J, Krebs MD, Hougaard DM, Werge T, Nordentoft M, Børglum AD, Musliner KL, Ganna A, Schork AJ, Mortensen PB, McGrath JJ, Privé F, Vilhjálmsson BJ. ADuLT: An efficient and robust time-to-event GWAS. Nat Commun 2023; 14:5553. [PMID: 37689771 PMCID: PMC10492844 DOI: 10.1038/s41467-023-41210-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 08/28/2023] [Indexed: 09/11/2023] Open
Abstract
Proportional hazards models have been proposed to analyse time-to-event phenotypes in genome-wide association studies (GWAS). However, little is known about the ability of proportional hazards models to identify genetic associations under different generative models and when ascertainment is present. Here we propose the age-dependent liability threshold (ADuLT) model as an alternative to a Cox regression based GWAS, here represented by SPACox. We compare ADuLT, SPACox, and standard case-control GWAS in simulations under two generative models and with varying degrees of ascertainment as well as in the iPSYCH cohort. We find Cox regression GWAS to be underpowered when cases are strongly ascertained (cases are oversampled by a factor 5), regardless of the generative model used. ADuLT is robust to ascertainment in all simulated scenarios. Then, we analyse four psychiatric disorders in iPSYCH, ADHD, Autism, Depression, and Schizophrenia, with a strong case-ascertainment. Across these psychiatric disorders, ADuLT identifies 20 independent genome-wide significant associations, case-control GWAS finds 17, and SPACox finds 8, which is consistent with simulation results. As more genetic data are being linked to electronic health records, robust GWAS methods that can make use of age-of-onset information will help increase power in analyses for common health outcomes.
Collapse
Affiliation(s)
- Emil M Pedersen
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark.
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark.
| | - Esben Agerbo
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Centre for Integrated Register-based Research at Aarhus University, Aarhus, Denmark
| | - Oleguer Plana-Ripoll
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Department of Clinical Epidemiology, Aarhus University and Aarhus University Hospital, Aarhus, Denmark
| | - Jette Steinbach
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Morten D Krebs
- Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital - Mental Health Services CPH, Copenhagen, Denmark
| | - David M Hougaard
- Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Thomas Werge
- Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital - Mental Health Services CPH, Copenhagen, Denmark
- Department of Clinical Sciences, Copenhagen University, Copenhagen, Denmark
- Section for Geogenetics, GLOBE Institute, Faculty of Health and Medical Science, Copenhagen University, Copenhagen, Denmark
| | - Merete Nordentoft
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- CORE- Copenhagen Centre for Research in Mental Health, Mental Health Center-Copenhagen, Copenhagen University Hospital - Mental Health Services CPH, Copenhagen, Denmark
| | - Anders D Børglum
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Department of Biomedicine and iSEQ Centre, Aarhus University, Aarhus, Denmark
- Center for Genomics and Personalized Medicine, CGPM, Aarhus University, Aarhus, Denmark
| | - Katherine L Musliner
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Department of Affective Disorders, Aarhus University Hospital-Psychiatry, Aarhus, Denmark
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - Andrea Ganna
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Andrew J Schork
- Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital - Mental Health Services CPH, Copenhagen, Denmark
- Section for Geogenetics, GLOBE Institute, Faculty of Health and Medical Science, Copenhagen University, Copenhagen, Denmark
- Neurogenomics Division, The Translational Genomics Research Institute (TGEN), Phoenix, AZ, USA
| | - Preben B Mortensen
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - John J McGrath
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Queensland Brain Institute, University of Queensland, St Lucia, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Wacol, QLD, Australia
| | - Florian Privé
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - Bjarni J Vilhjálmsson
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark.
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark.
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark.
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, the Broad Institute of MIT and Harvard, Massachusetts, USA.
| |
Collapse
|
46
|
Li Z, Meisner J, Albrechtsen A. Fast and accurate out-of-core PCA framework for large scale biobank data. Genome Res 2023; 33:1599-1608. [PMID: 37620119 PMCID: PMC10620046 DOI: 10.1101/gr.277525.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 08/18/2023] [Indexed: 08/26/2023]
Abstract
Principal component analysis (PCA) is widely used in statistics, machine learning, and genomics for dimensionality reduction and uncovering low-dimensional latent structure. To address the challenges posed by ever-growing data size, fast and memory-efficient PCA methods have gained prominence. In this paper, we propose a novel randomized singular value decomposition (RSVD) algorithm implemented in PCAone, featuring a window-based optimization scheme that enables accelerated convergence while improving the accuracy. Additionally, PCAone incorporates out-of-core and multithreaded implementations for the existing Implicitly Restarted Arnoldi Method (IRAM) and RSVD. Through comprehensive evaluations using multiple large-scale real-world data sets in different fields, we show the advantage of PCAone over existing methods. The new algorithm achieves significantly faster computation time while maintaining accuracy comparable to the slower IRAM method. Notably, our analyses of UK Biobank, comprising around 0.5 million individuals and 6.1 million common single nucleotide polymorphisms, show that PCAone accurately computes the top 40 principal components within 9 h. This analysis effectively captures population structure, signals of selection, structural variants, and low recombination regions, utilizing <20 GB of memory and 20 CPU threads. Furthermore, when applied to single-cell RNA sequencing data featuring 1.3 million cells, PCAone, accurately capturing the top 40 principal components in 49 min. This performance represents a 10-fold improvement over state-of-the-art tools.
Collapse
Affiliation(s)
- Zilong Li
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200 København, Denmark;
| | - Jonas Meisner
- Biological and Precision Psychiatry, Mental Health Centre Copenhagen, Copenhagen University Hospital, 2100 København, Denmark
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 København, Denmark
| | - Anders Albrechtsen
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200 København, Denmark
| |
Collapse
|
47
|
Sun T, Ding Y. Neural network on interval-censored data with application to the prediction of Alzheimer's disease. Biometrics 2023; 79:2677-2690. [PMID: 35960189 PMCID: PMC10177011 DOI: 10.1111/biom.13734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 08/01/2022] [Indexed: 11/28/2022]
Abstract
Alzheimer's disease (AD) is a progressive and polygenic disorder that affects millions of individuals each year. Given that there have been few effective treatments yet for AD, it is highly desirable to develop an accurate model to predict the full disease progression profile based on an individual's genetic characteristics for early prevention and clinical management. This work uses data composed of all four phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, including 1740 individuals with 8 million genetic variants. We tackle several challenges in this data, characterized by large-scale genetic data, interval-censored outcome due to intermittent assessments, and left truncation in one study phase (ADNIGO). Specifically, we first develop a semiparametric transformation model on interval-censored and left-truncated data and estimate parameters through a sieve approach. Then we propose a computationally efficient generalized score test to identify variants associated with AD progression. Next, we implement a novel neural network on interval-censored data (NN-IC) to construct a prediction model using top variants identified from the genome-wide test. Comprehensive simulation studies show that the NN-IC outperforms several existing methods in terms of prediction accuracy. Finally, we apply the NN-IC to the full ADNI data and successfully identify subgroups with differential progression risk profiles. Data used in the preparation of this article were obtained from the ADNI database.
Collapse
Affiliation(s)
- Tao Sun
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Ying Ding
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
48
|
Sofer T, Kurniansyah N, Granot-Hershkovitz E, Goodman MO, Tarraf W, Broce I, Lipton RB, Daviglus M, Lamar M, Wassertheil-Smoller S, Cai J, DeCarli CS, Gonzalez HM, Fornage M. A polygenic risk score for Alzheimer's disease constructed using APOE-region variants has stronger association than APOE alleles with mild cognitive impairment in Hispanic/Latino adults in the U.S. Alzheimers Res Ther 2023; 15:146. [PMID: 37649099 PMCID: PMC10469805 DOI: 10.1186/s13195-023-01298-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 08/24/2023] [Indexed: 09/01/2023]
Abstract
INTRODUCTION Polygenic Risk Scores (PRSs) are summaries of genetic risk alleles for an outcome. METHODS We used summary statistics from five GWASs of AD to construct PRSs in 4,189 diverse Hispanics/Latinos (mean age 63 years) from the Study of Latinos-Investigation of Neurocognitive Aging (SOL-INCA). We assessed the PRS associations with MCI in the combined set of people and in diverse subgroups, and when including and excluding the APOE gene region. We also assessed PRS associations with MCI in an independent dataset from the Mass General Brigham Biobank. RESULTS A simple sum of 5 PRSs ("PRSsum"), each constructed based on a different AD GWAS, was associated with MCI (OR = 1.28, 95% CI [1.14, 1.41]) in a model adjusted for counts of the APOE-[Formula: see text] and APOE-[Formula: see text] alleles. Associations of single-GWAS PRSs were weaker. When removing SNPs from the APOE region from the PRSs, the association of PRSsum with MCI was weaker (OR = 1.17, 95% CI [1.04,1.31] with adjustment for APOE alleles). In all association analyses, APOE-[Formula: see text] and APOE-[Formula: see text] alleles were not associated with MCI. DISCUSSION A sum of AD PRSs is associated with MCI in Hispanic/Latino older adults. Despite no association of APOE-[Formula: see text] and APOE-[Formula: see text] alleles with MCI, the association of the AD PRS with MCI is stronger when including the APOE region. Thus, APOE variants different than the classic APOE alleles may be important predictors of MCI in Hispanic/Latino adults.
Collapse
Affiliation(s)
- Tamar Sofer
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
- CardioVascular Institute, Beth Israel Deaconess Medical Center, Boston, MA, USA.
| | - Nuzulul Kurniansyah
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | - Einat Granot-Hershkovitz
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Matthew O Goodman
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Wassim Tarraf
- Institute of Gerontology, Wayne State University, Detroit, MI, USA
| | - Iris Broce
- Department of Neurosciences, University of California San Diego, San Diego, CA, USA
| | | | - Martha Daviglus
- Department of Medicine, Institute for Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
| | - Melissa Lamar
- Department of Medicine, Institute for Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
- Rush Alzheimer's Disease Research Center, Rush University Medical Center, Chicago, IL, USA
| | - Sylvia Wassertheil-Smoller
- Department of Epidemiology & Population Health, Department of Pediatrics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Charles S DeCarli
- Department of Neurology, University of California at Davis, Sacramento, CA, USA
| | - Hector M Gonzalez
- Department of Neurosciences, University of California San Diego, San Diego, CA, USA
- Shiley-Marcos Alzheimer's Disease Center, University of California San Diego, La Jolla, CA, USA
| | - Myriam Fornage
- Institute of Molecular Medicine, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
49
|
Albiñana C, Zhu Z, Schork AJ, Ingason A, Aschard H, Brikell I, Bulik CM, Petersen LV, Agerbo E, Grove J, Nordentoft M, Hougaard DM, Werge T, Børglum AD, Mortensen PB, McGrath JJ, Neale BM, Privé F, Vilhjálmsson BJ. Multi-PGS enhances polygenic prediction by combining 937 polygenic scores. Nat Commun 2023; 14:4702. [PMID: 37543680 PMCID: PMC10404269 DOI: 10.1038/s41467-023-40330-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 07/21/2023] [Indexed: 08/07/2023] Open
Abstract
The predictive performance of polygenic scores (PGS) is largely dependent on the number of samples available to train the PGS. Increasing the sample size for a specific phenotype is expensive and takes time, but this sample size can be effectively increased by using genetically correlated phenotypes. We propose a framework to generate multi-PGS from thousands of publicly available genome-wide association studies (GWAS) with no need to individually select the most relevant ones. In this study, the multi-PGS framework increases prediction accuracy over single PGS for all included psychiatric disorders and other available outcomes, with prediction R2 increases of up to 9-fold for attention-deficit/hyperactivity disorder compared to a single PGS. We also generate multi-PGS for phenotypes without an existing GWAS and for case-case predictions. We benchmark the multi-PGS framework against other methods and highlight its potential application to new emerging biobanks.
Collapse
Affiliation(s)
- Clara Albiñana
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark.
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark.
| | - Zhihong Zhu
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Andrew J Schork
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Copenhagen, 2100, Denmark
- The Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Andrés Ingason
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Copenhagen, 2100, Denmark
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université de Paris, 25-28 Rue du Dr Roux, 75015, Paris, France
| | - Isabell Brikell
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000, Aarhus C, Denmark
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
| | - Cynthia M Bulik
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
| | - Liselotte V Petersen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Esben Agerbo
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Jakob Grove
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000, Aarhus C, Denmark
- Center for Genomics and Personalized Medicine, Aarhus University, 8000, Aarhus C, Denmark
- Bioinformatics Research Centre, Aarhus University, 8000, Aarhus C, Denmark
| | - Merete Nordentoft
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Copenhagen Research Centre on Mental Health (CORE), University of Copenhagen, Copenhagen, Denmark
| | - David M Hougaard
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, 2300, Copenhagen S, Denmark
| | - Thomas Werge
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Copenhagen, 2100, Denmark
- Lundbeck Foundation Centre for GeoGenetics, GLOBE Institute, University of Copenhagen, 1350, Copenhagen K, Denmark
| | - Anders D Børglum
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000, Aarhus C, Denmark
- Center for Genomics and Personalized Medicine, Aarhus University, 8000, Aarhus C, Denmark
| | - Preben Bo Mortensen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - John J McGrath
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Brisbane, QLD, 4076, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Florian Privé
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Bjarni J Vilhjálmsson
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark.
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark.
- Bioinformatics Research Centre, Aarhus University, 8000, Aarhus C, Denmark.
- Novo Nordisk Foundation Center for Genomic Mechanisms, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
50
|
Wong CK, Dite GS, Spaeth E, Murphy NM, Allman R. Melanoma risk prediction based on a polygenic risk score and clinical risk factors. Melanoma Res 2023; 33:293-299. [PMID: 37096571 PMCID: PMC10309112 DOI: 10.1097/cmr.0000000000000896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 03/30/2023] [Indexed: 04/26/2023]
Abstract
Melanoma is one of the most commonly diagnosed cancers in the Western world: third in Australia, fifth in the USA and sixth in the European Union. Predicting an individual's personal risk of developing melanoma may aid them in undertaking effective risk reduction measures. The objective of this study was to use the UK Biobank to predict the 10-year risk of melanoma using a newly developed polygenic risk score (PRS) and an existing clinical risk model. We developed the PRS using a matched case-control training dataset ( N = 16 434) in which age and sex were controlled by design. The combined risk score was developed using a cohort development dataset ( N = 54 799) and its performance was tested using a cohort testing dataset ( N = 54 798). Our PRS comprises 68 single-nucleotide polymorphisms and had an area under the receiver operating characteristic curve of 0.639 [95% confidence interval (CI) = 0.618-0.661]. In the cohort testing data, the hazard ratio per SD of the combined risk score was 1.332 (95% CI = 1.263-1.406). Harrell's C-index was 0.685 (95% CI = 0.654-0.715). Overall, the standardized incidence ratio was 1.193 (95% CI = 1.067-1.335). By combining a PRS and a clinical risk score, we have developed a risk prediction model that performs well in terms of discrimination and calibration. At an individual level, information on the 10-year risk of melanoma can motivate people to take risk-reduction action. At the population level, risk stratification can allow more effective population-level screening strategies to be implemented.
Collapse
Affiliation(s)
| | | | - Erika Spaeth
- Phenogen Sciences Inc., Charlotte, North Carolina, USA
| | | | | |
Collapse
|