1
|
Khan A, Kiryluk K. Polygenic scores and their applications in kidney disease. Nat Rev Nephrol 2024:10.1038/s41581-024-00886-2. [PMID: 39271761 DOI: 10.1038/s41581-024-00886-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/06/2024] [Indexed: 09/15/2024]
Abstract
Genome-wide association studies (GWAS) have uncovered thousands of risk variants that individually have small effects on the risk of human diseases, including chronic kidney disease, type 2 diabetes, heart diseases and inflammatory disorders, but cumulatively explain a substantial fraction of disease risk, underscoring the complexity and pervasive polygenicity of common disorders. This complexity poses unique challenges to the clinical translation of GWAS findings. Polygenic scores combine small effects of individual GWAS risk variants across the genome to improve personalized risk prediction. Several polygenic scores have now been developed that exhibit sufficiently large effects to be considered clinically actionable. However, their clinical use is limited by their partial transferability across ancestries and a lack of validated models that combine polygenic, monogenic, family history and clinical risk factors. Moreover, prospective studies are still needed to demonstrate the clinical utility and cost-effectiveness of polygenic scores in clinical practice. Here, we discuss evolving methods for developing polygenic scores, best practices for validating and reporting their performance, and the study designs that will empower their clinical implementation. We specifically focus on the polygenic scores relevant to nephrology and other chronic, complex diseases and review their key limitations, necessary refinements and potential clinical applications.
Collapse
Affiliation(s)
- Atlas Khan
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA.
| |
Collapse
|
2
|
Chang D, Gupta VK, Hur B, Cobo-López S, Cunningham KY, Han NS, Lee I, Kronzer VL, Teigen LM, Karnatovskaia LV, Longbrake EE, Davis JM, Nelson H, Sung J. Gut Microbiome Wellness Index 2 enhances health status prediction from gut microbiome taxonomic profiles. Nat Commun 2024; 15:7447. [PMID: 39198444 PMCID: PMC11358288 DOI: 10.1038/s41467-024-51651-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 08/09/2024] [Indexed: 09/01/2024] Open
Abstract
Recent advancements in translational gut microbiome research have revealed its crucial role in shaping predictive healthcare applications. Herein, we introduce the Gut Microbiome Wellness Index 2 (GMWI2), an enhanced version of our original GMWI prototype, designed as a standardized disease-agnostic health status indicator based on gut microbiome taxonomic profiles. Our analysis involves pooling existing 8069 stool shotgun metagenomes from 54 published studies across a global demographic landscape (spanning 26 countries and six continents) to identify gut taxonomic signals linked to disease presence or absence. GMWI2 achieves a cross-validation balanced accuracy of 80% in distinguishing healthy (no disease) from non-healthy (diseased) individuals and surpasses 90% accuracy for samples with higher confidence (i.e., outside the "reject option"). This performance exceeds that of the original GMWI model and traditional species-level α-diversity indices, indicating a more robust gut microbiome signature for differentiating between healthy and non-healthy phenotypes across multiple diseases. When assessed through inter-study validation and external validation cohorts, GMWI2 maintains an average accuracy of nearly 75%. Furthermore, by reevaluating previously published datasets, GMWI2 offers new insights into the effects of diet, antibiotic exposure, and fecal microbiota transplantation on gut health. Available as an open-source command-line tool, GMWI2 represents a timely, pivotal resource for evaluating health using an individual's unique gut microbial composition.
Collapse
Affiliation(s)
- Daniel Chang
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Vinod K Gupta
- Microbiomics Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
| | - Benjamin Hur
- Microbiomics Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
| | - Sergio Cobo-López
- Viral Information Institute, San Diego State University, San Diego, CA, USA
| | - Kevin Y Cunningham
- Bioinformatics and Computational Biology Program, University of Minnesota, Minneapolis, MN, USA
| | - Nam Soo Han
- Brain Korea 21 Center for Bio-Health Industry, Department of Food Science and Biotechnology, Chungbuk National University, Cheongju, South Korea
| | - Insuk Lee
- Department of Biotechnology, Yonsei University, Seoul, South Korea
| | - Vanessa L Kronzer
- Division of Rheumatology, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Levi M Teigen
- Department of Food Science and Nutrition, University of Minnesota, St. Paul, MN, USA
| | | | | | - John M Davis
- Division of Rheumatology, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Heidi Nelson
- Emeritus, Department of Surgery, Mayo Clinic, Rochester, MN, USA
| | - Jaeyun Sung
- Microbiomics Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA.
- Division of Rheumatology, Department of Medicine, Mayo Clinic, Rochester, MN, USA.
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
3
|
Liao H, Xue H, Pan W. Inferring causal direction between two traits using R 2 with application to transcriptome-wide association studies. Am J Hum Genet 2024; 111:1782-1795. [PMID: 39053457 PMCID: PMC11339628 DOI: 10.1016/j.ajhg.2024.06.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 06/17/2024] [Accepted: 06/24/2024] [Indexed: 07/27/2024] Open
Abstract
In Mendelian randomization, two single SNP-trait correlation-based methods have been developed to infer the causal direction between an exposure (e.g., a gene) and an outcome (e.g., a trait), called MR Steiger's method and its recent extension called Causal Direction-Ratio (CD-Ratio). Here we propose an approach based on R2, the coefficient of determination, to combine information from multiple (possibly correlated) SNPs to simultaneously infer the presence and direction of a causal relationship between an exposure and an outcome. Our proposed method generalizes Steiger's method from using a single SNP to multiple SNPs as IVs. It is especially useful in transcriptome-wide association studies (TWASs) (and similar applications) with typically small sample sizes for gene expression (or another molecular trait) data, providing a more flexible and powerful approach to inferring causal directions. It can be applied to GWAS summary data with a reference panel. We also discuss the influence of invalid IVs and introduce a new approach called R2S to select and remove invalid IVs (if any) to enhance the robustness. We compared the performance of the proposed method with existing methods in simulations to demonstrate its advantages. We applied the methods to identify causal genes for high/low-density lipoprotein cholesterol (HDL/LDL) using the individual-level GTEx gene expression data and UK Biobank GWAS data. The proposed method was able to confirm some well-known causal genes while identifying some novel ones. Additionally, we illustrated an application of the proposed method to GWAS summary to infer causal relationships between HDL/LDL and stroke/coronary artery disease (CAD).
Collapse
Affiliation(s)
- Huiling Liao
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Haoran Xue
- Department of Biostatistics, City University of Hong Kong, Kowloon, Hong Kong
| | - Wei Pan
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
4
|
Zhao B, Zheng S, Zhu H. ON BLOCKWISE AND REFERENCE PANEL-BASED ESTIMATORS FOR GENETIC DATA PREDICTION IN HIGH DIMENSIONS. Ann Stat 2024; 52:948-965. [PMID: 39281348 PMCID: PMC11391480 DOI: 10.1214/24-aos2378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/18/2024]
Abstract
Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.
Collapse
Affiliation(s)
- Bingxin Zhao
- Department of Statistics and Data Science, University of Pennsylvania
| | - Shurong Zheng
- School of Mathematics and Statistics, Northeast Normal University
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill
| |
Collapse
|
5
|
Zilinskas R, Li C, Shen X, Pan W, Yang T. Inferring a directed acyclic graph of phenotypes from GWAS summary statistics. Biometrics 2024; 80:ujad039. [PMID: 38470257 PMCID: PMC10928990 DOI: 10.1093/biomtc/ujad039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 11/24/2023] [Accepted: 01/04/2024] [Indexed: 03/13/2024]
Abstract
Estimating phenotype networks is a growing field in computational biology. It deepens the understanding of disease etiology and is useful in many applications. In this study, we present a method that constructs a phenotype network by assuming a Gaussian linear structure model embedding a directed acyclic graph (DAG). We utilize genetic variants as instrumental variables and show how our method only requires access to summary statistics from a genome-wide association study (GWAS) and a reference panel of genotype data. Besides estimation, a distinct feature of the method is its summary statistics-based likelihood ratio test on directed edges. We applied our method to estimate a causal network of 29 cardiovascular-related proteins and linked the estimated network to Alzheimer's disease (AD). A simulation study was conducted to demonstrate the effectiveness of this method. An R package sumdag implementing the proposed method, all relevant code, and a Shiny application are available.
Collapse
Affiliation(s)
| | - Chunlin Li
- Department of Statistics, Iowa State University, Ames, IA 50011, United States
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Wei Pan
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
6
|
Cao R, Olawsky E, McFowland E, Marcotte E, Spector L, Yang T. Subset scanning for multi-trait analysis using GWAS summary statistics. Bioinformatics 2024; 40:btad777. [PMID: 38191683 PMCID: PMC11087659 DOI: 10.1093/bioinformatics/btad777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/23/2023] [Accepted: 01/05/2024] [Indexed: 01/10/2024] Open
Abstract
MOTIVATION Multi-trait analysis has been shown to have greater statistical power than single-trait analysis. Most of the existing multi-trait analysis methods only work with a limited number of traits and usually prioritize high statistical power over identifying relevant traits, which heavily rely on domain knowledge. RESULTS To handle diseases and traits with obscure etiology, we developed TraitScan, a powerful and fast algorithm that identifies potential pleiotropic traits from a moderate or large number of traits (e.g. dozens to thousands) and tests the association between one genetic variant and the selected traits. TraitScan can handle either individual-level or summary-level GWAS data. We evaluated TraitScan using extensive simulations and found that it outperformed existing methods in terms of both testing power and trait selection when sparsity was low or modest. We then applied it to search for traits associated with Ewing Sarcoma, a rare bone tumor with peak onset in adolescence, among 754 traits in UK Biobank. Our analysis revealed a few promising traits worthy of further investigation, highlighting the use of TraitScan for more effective multi-trait analysis as biobanks emerge. We also extended TraitScan to search and test association with a polygenic risk score and genetically imputed gene expression. AVAILABILITY AND IMPLEMENTATION Our algorithm is implemented in an R package "TraitScan" available at https://github.com/RuiCao34/TraitScan.
Collapse
Affiliation(s)
- Rui Cao
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
| | - Evan Olawsky
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
| | - Edward McFowland
- Technology and Operations Management, Harvard Business School, Harvard University, Boston, MA 02163, United States
| | - Erin Marcotte
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| | - Logan Spector
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| |
Collapse
|
7
|
Privé F, Albiñana C, Arbel J, Pasaniuc B, Vilhjálmsson BJ. Inferring disease architecture and predictive ability with LDpred2-auto. Am J Hum Genet 2023; 110:2042-2055. [PMID: 37944514 PMCID: PMC10716363 DOI: 10.1016/j.ajhg.2023.10.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/15/2023] [Accepted: 10/17/2023] [Indexed: 11/12/2023] Open
Abstract
LDpred2 is a widely used Bayesian method for building polygenic scores (PGSs). LDpred2-auto can infer the two parameters from the LDpred model, the SNP heritability h2 and polygenicity p, so that it does not require an additional validation dataset to choose best-performing parameters. The main aim of this paper is to properly validate the use of LDpred2-auto for inferring multiple genetic parameters. Here, we present a new version of LDpred2-auto that adds an optional third parameter α to its model, for modeling negative selection. We then validate the inference of these three parameters (or two, when using the previous model). We also show that LDpred2-auto provides per-variant probabilities of being causal that are well calibrated and can therefore be used for fine-mapping purposes. We also introduce a formula to infer the out-of-sample predictive performance r2 of the resulting PGS directly from the Gibbs sampler of LDpred2-auto. Finally, we extend the set of HapMap3 variants recommended to use with LDpred2 with 37% more variants to improve the coverage of this set, and we show that this new set of variants captures 12% more heritability and provides 6% more predictive performance, on average, in UK Biobank analyses.
Collapse
Affiliation(s)
- Florian Privé
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark.
| | - Clara Albiñana
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
| | - Julyan Arbel
- University Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, France
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bjarni J Vilhjálmsson
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark; Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark; Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| |
Collapse
|
8
|
Chen S, Lin Z, Shen X, Li L, Pan W. Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data. Genet Epidemiol 2023; 47:585-599. [PMID: 37573486 PMCID: PMC10840616 DOI: 10.1002/gepi.22535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 06/19/2023] [Accepted: 08/01/2023] [Indexed: 08/14/2023]
Abstract
We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.
Collapse
Affiliation(s)
- Siyi Chen
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455
| | - Zhaotong Lin
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455
| | - Ling Li
- Department of Experimental and Clinical Pharmacology, College of Pharmacy, University of Minnesota, Minneapolis, MN 55455
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455
| |
Collapse
|
9
|
Chen J, Gatev E, Everson T, Conneely KN, Koen N, Epstein MP, Kobor MS, Zar HJ, Stein DJ, Hüls A. Pruning and thresholding approach for methylation risk scores in multi-ancestry populations. Epigenetics 2023; 18:2187172. [PMID: 36908043 PMCID: PMC10026878 DOI: 10.1080/15592294.2023.2187172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023] Open
Abstract
Recent efforts have focused on developing methylation risk scores (MRS), a weighted sum of the individual's DNA methylation (DNAm) values of pre-selected CpG sites. Most of the current MRS approaches that utilize Epigenome-wide association studies (EWAS) summary statistics only include genome-wide significant CpG sites and do not consider co-methylation. New methods that relax the p-value threshold to include more CpG sites and account for the inter-correlation of DNAm might improve the predictive performance of MRS. We paired informed co-methylation pruning with P-value thresholding to generate pruning and thresholding (P+T) MRS and evaluated its performance among multi-ancestry populations. Through simulation studies and real data analyses, we demonstrated that pruning provides an improvement over simple thresholding methods for prediction of phenotypes. We demonstrated that European-derived summary statistics can be used to develop P+T MRS among other populations such as African populations. However, the prediction accuracy of P+T MRS may differ across multi-ancestry population due to environmental/cultural/social differences.
Collapse
Affiliation(s)
- Junyu Chen
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA USA
| | - Evan Gatev
- Institute of Molecular Biology "Acad. Roumen Tsanev", Sofia, Bulgaria
- Department of Medical Genetics, University of British Columbia, Vancouver, Canada
| | - Todd Everson
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA USA
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia
| | - Karen N Conneely
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA USA
| | - Nastassja Koen
- Neuroscience Institute, University of Cape Town, Cape Town, South Africa
- Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council (SAMRC) Unit on Risk and Resilience in Mental Disorders, University of Cape Town, Cape Town, South Africa
| | - Michael P Epstein
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA USA
| | - Michael S Kobor
- Department of Medical Genetics, University of British Columbia, Vancouver, Canada
- BC Children's Hospital Research Institute, Vancouver, Canada
- Centre for Molecular Medicine and Therapeutics, Vancouver, Canada
| | - Heather J Zar
- Department of Pediatrics and Child Health, Red Cross War Memorial Children's Hospital, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council (SAMRC) Unit on Child and Adolescent Health, University of Cape Town, Cape Town, South Africa
| | - Dan J Stein
- Neuroscience Institute, University of Cape Town, Cape Town, South Africa
- Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council (SAMRC) Unit on Risk and Resilience in Mental Disorders, University of Cape Town, Cape Town, South Africa
| | - Anke Hüls
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA USA
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia
| |
Collapse
|
10
|
Zilinskas R, Li C, Shen X, Pan W, Yang T. Inferring a directed acyclic graph of phenotypes from GWAS summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.10.528092. [PMID: 38045347 PMCID: PMC10690198 DOI: 10.1101/2023.02.10.528092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Estimating phenotype networks is a growing field in computational biology. It deepens the understanding of disease etiology and is useful in many applications. In this study, we present a method that constructs a phenotype network by assuming a Gaussian linear structure model embedding a directed acyclic graph (DAG). We utilize genetic variants as instrumental variables and show how our method only requires access to summary statistics from a genome-wide association study (GWAS) and a reference panel of genotype data. Besides estimation, a distinct feature of the method is its summary statistics-based likelihood ratio test on directed edges. We applied our method to estimate a causal network of 29 cardiovascular-related proteins and linked the estimated network to Alzheimer's disease (AD). A simulation study was conducted to demonstrate the effectiveness of this method. An R package sumdag implementing the proposed method, all relevant code, and a Shiny application are available at https://github.com/chunlinli/sumdag.
Collapse
Affiliation(s)
| | - Chunlin Li
- Department of Statistics, Iowa State University, Ames, Iowa 50011, U.S.A
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| | - Wei Pan
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| |
Collapse
|
11
|
Chang D, Gupta VK, Hur B, Cobo-López S, Cunningham KY, Han NS, Lee I, Kronzer VL, Teigen LM, Karnatovskaia LV, Longbrake EE, Davis JM, Nelson H, Sung J. Gut Microbiome Wellness Index 2 for Enhanced Health Status Prediction from Gut Microbiome Taxonomic Profiles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.30.560294. [PMID: 37873265 PMCID: PMC10592848 DOI: 10.1101/2023.09.30.560294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Recent advancements in human gut microbiome research have revealed its crucial role in shaping innovative predictive healthcare applications. We introduce Gut Microbiome Wellness Index 2 (GMWI2), an advanced iteration of our original GMWI prototype, designed as a robust, disease-agnostic health status indicator based on gut microbiome taxonomic profiles. Our analysis involved pooling existing 8069 stool shotgun metagenome data across a global demographic landscape to effectively capture biological signals linking gut taxonomies to health. GMWI2 achieves a cross-validation balanced accuracy of 80% in distinguishing healthy (no disease) from non-healthy (diseased) individuals and surpasses 90% accuracy for samples with higher confidence (i.e., outside the "reject option"). The enhanced classification accuracy of GMWI2 outperforms both the original GMWI model and traditional species-level α-diversity indices, suggesting a more reliable tool for differentiating between healthy and non-healthy phenotypes using gut microbiome data. Furthermore, by reevaluating and reinterpreting previously published data, GMWI2 provides fresh insights into the established understanding of how diet, antibiotic exposure, and fecal microbiota transplantation influence gut health. Looking ahead, GMWI2 represents a timely pivotal tool for evaluating health based on an individual's unique gut microbial composition, paving the way for the early screening of adverse gut health shifts. GMWI2 is offered as an open-source command-line tool, ensuring it is both accessible to and adaptable for researchers interested in the translational applications of human gut microbiome science.
Collapse
Affiliation(s)
- Daniel Chang
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Vinod K Gupta
- Microbiome Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
- Division of Surgery Research, Department of Surgery, Mayo Clinic, Rochester, MN 55905, USA
| | - Benjamin Hur
- Microbiome Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
- Division of Surgery Research, Department of Surgery, Mayo Clinic, Rochester, MN 55905, USA
| | - Sergio Cobo-López
- Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Kevin Y Cunningham
- Bioinformatics and Computational Biology Program, University of Minnesota, Minneapolis, MN 55455, USA
| | - Nam Soo Han
- Brain Korea 21 Center for Bio-Health Industry, Department of Food Science and Biotechnology, Chungbuk National University, Cheongju, South Korea
| | - Insuk Lee
- Department of Biotechnology, Yonsei University, Seoul 03722, South Korea
| | - Vanessa L Kronzer
- Division of Rheumatology, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Levi M Teigen
- Department of Food Science and Nutrition, University of Minnesota, St. Paul, MN 55108, USA
| | | | - Erin E Longbrake
- Department of Neurology, Yale University, New Haven, CT 06510, USA
| | - John M Davis
- Division of Rheumatology, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Heidi Nelson
- Emeritus, Department of Surgery, Mayo Clinic, Rochester, MN 55905, USA
| | - Jaeyun Sung
- Microbiome Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
- Division of Surgery Research, Department of Surgery, Mayo Clinic, Rochester, MN 55905, USA
- Division of Rheumatology, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
12
|
Gyawali PK, Le Guen Y, Liu X, Belloy ME, Tang H, Zou J, He Z. Improving genetic risk prediction across diverse population by disentangling ancestry representations. Commun Biol 2023; 6:964. [PMID: 37736834 PMCID: PMC10517023 DOI: 10.1038/s42003-023-05352-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 09/12/2023] [Indexed: 09/23/2023] Open
Abstract
Risk prediction models using genetic data have seen increasing traction in genomics. However, most of the polygenic risk models were developed using data from participants with similar (mostly European) ancestry. This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans. To address this issue, largely due to the prediction models being biased by the underlying population structure, we propose a deep-learning framework that leverages data from diverse population and disentangles ancestry from the phenotype-relevant information in its representation. The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations. We applied the proposed method to the analysis of Alzheimer's disease genetics. Comparing with standard linear and nonlinear risk prediction methods, the proposed method substantially improves risk prediction in minority populations, including admixed individuals, without needing self-reported ancestry information.
Collapse
Affiliation(s)
- Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA.
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
- Institut du Cerveau-Paris Brain Institute-ICM, Paris, France
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA.
- Quantitative Sciences Unit, Department of Medicine (Biomedical Informatics Research), Stanford University, Stanford, CA, USA.
| |
Collapse
|
13
|
Shieh Y, Roger J, Yau C, Wolf DM, Hirst GL, Swigart LB, Huntsman S, Hu D, Nierenberg JL, Middha P, Heise RS, Shi Y, Kachuri L, Zhu Q, Yao S, Ambrosone CB, Kwan ML, Caan BJ, Witte JS, Kushi LH, 't Veer LV, Esserman LJ, Ziv E. Development and testing of a polygenic risk score for breast cancer aggressiveness. NPJ Precis Oncol 2023; 7:42. [PMID: 37188791 PMCID: PMC10185660 DOI: 10.1038/s41698-023-00382-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 04/28/2023] [Indexed: 05/17/2023] Open
Abstract
Aggressive breast cancers portend a poor prognosis, but current polygenic risk scores (PRSs) for breast cancer do not reliably predict aggressive cancers. Aggressiveness can be effectively recapitulated using tumor gene expression profiling. Thus, we sought to develop a PRS for the risk of recurrence score weighted on proliferation (ROR-P), an established prognostic signature. Using 2363 breast cancers with tumor gene expression data and single nucleotide polymorphism (SNP) genotypes, we examined the associations between ROR-P and known breast cancer susceptibility SNPs using linear regression models. We constructed PRSs based on varying p-value thresholds and selected the optimal PRS based on model r2 in 5-fold cross-validation. We then used Cox proportional hazards regression to test the ROR-P PRS's association with breast cancer-specific survival in two independent cohorts totaling 10,196 breast cancers and 785 events. In meta-analysis of these cohorts, higher ROR-P PRS was associated with worse survival, HR per SD = 1.13 (95% CI 1.06-1.21, p = 4.0 × 10-4). The ROR-P PRS had a similar magnitude of effect on survival as a comparator PRS for estrogen receptor (ER)-negative versus positive cancer risk (PRSER-/ER+). Furthermore, its effect was minimally attenuated when adjusted for PRSER-/ER+, suggesting that the ROR-P PRS provides additional prognostic information beyond ER status. In summary, we used integrated analysis of germline SNP and tumor gene expression data to construct a PRS associated with aggressive tumor biology and worse survival. These findings could potentially enhance risk stratification for breast cancer screening and prevention.
Collapse
Affiliation(s)
- Yiwey Shieh
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
| | - Jacquelyn Roger
- PhD Program in Biological and Medical Informatics, University of California, San Francisco, San Francisco, CA, USA
| | - Christina Yau
- Department of Surgery, University of California, San Francisco, San Francisco, CA, USA
| | - Denise M Wolf
- Department of Laboratory Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Gillian L Hirst
- Department of Surgery, University of California, San Francisco, San Francisco, CA, USA
| | - Lamorna Brown Swigart
- Department of Laboratory Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Scott Huntsman
- Division of General Internal Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Donglei Hu
- Division of General Internal Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Jovia L Nierenberg
- Division of General Internal Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
| | - Pooja Middha
- Division of General Internal Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Rachel S Heise
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yushu Shi
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University, Stanford, CA, USA
| | - Qianqian Zhu
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, USA
| | - Song Yao
- Department of Cancer Prevention and Control, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | - Christine B Ambrosone
- Department of Cancer Prevention and Control, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | - Marilyn L Kwan
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Bette J Caan
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University, Stanford, CA, USA
| | - Lawrence H Kushi
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Laura van 't Veer
- Department of Laboratory Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Laura J Esserman
- Department of Surgery, University of California, San Francisco, San Francisco, CA, USA
| | - Elad Ziv
- Division of General Internal Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
14
|
Spanbauer C, Pan W. Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles. Genet Epidemiol 2023; 47:26-44. [PMID: 36349692 PMCID: PMC9892284 DOI: 10.1002/gepi.22505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/08/2022] [Accepted: 09/21/2022] [Indexed: 11/11/2022]
Abstract
Using high-dimensional genetic variants such as single nucleotide polymorphisms (SNP) to predict complex diseases and traits has important applications in basic research and other clinical settings. For example, predicting gene expression is a necessary first step to identify (putative) causal genes in transcriptome-wide association studies. Due to weak signals, high-dimensionality, and linkage disequilibrium (correlation) among SNPs, building such a prediction model is challenging. However, functional annotations at the SNP level (e.g., as epigenomic data across multiple cell- or tissue-types) are available and could be used to inform predictor importance and aid in outcome prediction. Existing approaches to incorporate annotations have been based mainly on (generalized) linear models. Bayesian additive regression trees (BART), in contrast, is a reliable method to obtain high-quality nonlinear out of sample predictions without overfitting. Unfortunately, the default prior from BART may be too inflexible to handle sparse situations where the number of predictors approaches or surpasses the number of observations. Motivated by our real data application, this article proposes an alternative prior based on the logit normal distribution because it provides a framework that is adaptive to sparsity and can model informative functional annotations. It also provides a framework to incorporate prior information about the between SNP correlations. Computational details for carrying out inference are presented along with the results from a simulation study and a genome-wide prediction analysis of the Alzheimer's Disease Neuroimaging Initiative data.
Collapse
Affiliation(s)
- Charles Spanbauer
- Division of Biostatistics, University of Minnesota, MN, USA,Corresponding author;
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, MN, USA
| | - The Alzheimer’s Disease Neuroimaging Initiative
- Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
| |
Collapse
|
15
|
Wang C, Zhang J, Veldsman WP, Zhou X, Zhang L. A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants. Brief Bioinform 2023; 24:6965909. [PMID: 36585786 DOI: 10.1093/bib/bbac552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 11/04/2022] [Accepted: 11/14/2022] [Indexed: 01/01/2023] Open
Abstract
Quantifying an individual's risk for common diseases is an important goal of precision health. The polygenic risk score (PRS), which aggregates multiple risk alleles of candidate diseases, has emerged as a standard approach for identifying high-risk individuals. Although several studies have been performed to benchmark the PRS calculation tools and assess their potential to guide future clinical applications, some issues remain to be further investigated, such as lacking (i) various simulated data with different genetic effects; (ii) evaluation of machine learning models and (iii) evaluation on multiple ancestries studies. In this study, we systematically validated and compared 13 statistical methods, 5 machine learning models and 2 ensemble models using simulated data with additive and genetic interaction models, 22 common diseases with internal training sets, 4 common diseases with external summary statistics and 3 common diseases for trans-ancestry studies in UK Biobank. The statistical methods were better in simulated data from additive models and machine learning models have edges for data that include genetic interactions. Ensemble models are generally the best choice by integrating various statistical methods. LDpred2 outperformed the other standalone tools, whereas PRS-CS, lassosum and DBSLMM showed comparable performance. We also identified that disease heritability strongly affected the predictive performance of all methods. Both the number and effect sizes of risk SNPs are important; and sample size strongly influences the performance of all methods. For the trans-ancestry studies, we found that the performance of most methods became worse when training and testing sets were from different populations.
Collapse
Affiliation(s)
- Chonghao Wang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SRA, China
| | - Jing Zhang
- Eye Institute and Department of Ophthalmology, NHC Key Laboratory of Myopia (Fudan University), Eye & ENT Hospital, Fudan University, Shanghai, China
| | | | - Xin Zhou
- Department of Biomedical Engineering, Vanderbilt University, Vanderbilt Place Nashville, 37235, TN, USA
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SRA, China
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
| |
Collapse
|
16
|
Maj C, Staerk C, Borisov O, Klinkhammer H, Wai Yeung M, Krawitz P, Mayr A. Statistical learning for sparser fine-mapped polygenic models: The prediction of LDL-cholesterol. Genet Epidemiol 2022; 46:589-603. [PMID: 35938382 DOI: 10.1002/gepi.22495] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 11/10/2022]
Abstract
Polygenic risk scores quantify the individual genetic predisposition regarding a particular trait. We propose and illustrate the application of existing statistical learning methods to derive sparser models for genome-wide data with a polygenic signal. Our approach is based on three consecutive steps. First, potentially informative loci are identified by a marginal screening approach. Then, fine-mapping is independently applied for blocks of variants in linkage disequilibrium, where informative variants are retrieved by using variable selection methods including boosting with probing and stochastic searches with the Adaptive Subspace method. Finally, joint prediction models with the selected variants are derived using statistical boosting. In contrast to alternative approaches relying on univariate summary statistics from genome-wide association studies, our three-step approach enables to select and fit multivariable regression models on large-scale genotype data. Based on UK Biobank data, we develop prediction models for LDL-cholesterol as a continuous trait. Additionally, we consider a recent scalable algorithm for the Lasso. Results show that statistical learning approaches based on fine-mapping of genetic signals result in a competitive prediction performance compared to classical polygenic risk approaches, while yielding sparser risk models.
Collapse
Affiliation(s)
- Carlo Maj
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
- Centre for Human Genetics, University of Marburg, Marburg, Germany
| | - Christian Staerk
- Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University Bonn, Bonn, Germany
| | - Oleg Borisov
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
| | - Hannah Klinkhammer
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
- Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University Bonn, Bonn, Germany
| | - Ming Wai Yeung
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
- Department of Cardiology, University of Groningen, Groningen, The Netherlands
| | - Peter Krawitz
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
| | - Andreas Mayr
- Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University Bonn, Bonn, Germany
| |
Collapse
|
17
|
Balagué-Dobón L, Cáceres A, González JR. Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure. Brief Bioinform 2022; 23:bbac043. [PMID: 35211719 PMCID: PMC8921734 DOI: 10.1093/bib/bbac043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 12/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
Collapse
|
18
|
Song S, Hou L, Liu JS. A data-adaptive Bayesian regression approach for polygenic risk prediction. Bioinformatics 2022; 38:1938-1946. [PMID: 35020805 PMCID: PMC8963326 DOI: 10.1093/bioinformatics/btac024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 12/21/2021] [Accepted: 01/09/2022] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Polygenic risk score (PRS) has been widely exploited for genetic risk prediction due to its accuracy and conceptual simplicity. We introduce a unified Bayesian regression framework, NeuPred, for PRS construction, which accommodates varying genetic architectures and improves overall prediction accuracy for complex diseases by allowing for a wide class of prior choices. To take full advantage of the framework, we propose a summary-statistics-based cross-validation strategy to automatically select suitable chromosome-level priors, which demonstrates a striking variability of the prior preference of each chromosome, for the same complex disease, and further significantly improves the prediction accuracy. RESULTS Simulation studies and real data applications with seven disease datasets from the Wellcome Trust Case Control Consortium cohort and eight groups of large-scale genome-wide association studies demonstrate that NeuPred achieves substantial and consistent improvements in terms of predictive r2 over existing methods. In addition, NeuPred has similar or advantageous computational efficiency compared with the state-of-the-art Bayesian methods. AVAILABILITY AND IMPLEMENTATION The R package implementing NeuPred is available at https://github.com/shuangsong0110/NeuPred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuang Song
- Center for Statistical Science, Tsinghua University, Beijing
100084, China,School of Life Sciences, Department of Industrial Engineering, Tsinghua
University, Beijing 100084, China
| | - Lin Hou
- To whom correspondence should be addressed.
or
| | - Jun S Liu
- To whom correspondence should be addressed.
or
| |
Collapse
|
19
|
Bae YE, Wu L, Wu C. InTACT: An adaptive and powerful framework for joint-tissue transcriptome-wide association studies. Genet Epidemiol 2021; 45:848-859. [PMID: 34255882 PMCID: PMC8604767 DOI: 10.1002/gepi.22425] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/22/2021] [Accepted: 06/24/2021] [Indexed: 11/05/2022]
Abstract
Transcriptome-wide association studies (TWAS) that integrate transcriptomic reference data and genome-wide association studies (GWAS) have successfully enhanced the discovery of candidate genes for many complex traits. However, existing methods may suffer from substantial power loss because they fail to effectively consider that expression of many genes tends to be consistent across tissues. Here we propose a computationally efficient testing method, referred to as Integrative Test for Associations via Cauchy Transformation (InTACT), that effectively combines information across multiple tissues and thus improves the power of identifying associated genes. Through simulation studies, we show that InTACT maintains high power while properly controls for Type 1 error rates. We applied InTACT to the largest GWAS of Alzheimer's disease (AD) to date and identified 227 genome-wide significant genes, of which 130 were not identified by benchmark methods, TWAS and MultiXcan. Importantly, InTACT identified five novel loci for AD. We implemented InTACT in publicly available software, "InTACT."
Collapse
Affiliation(s)
- Ye Eun Bae
- Department of Statistics, Florida State University
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa
| | - Chong Wu
- Department of Statistics, Florida State University
| |
Collapse
|
20
|
Abstract
Over the past decade, substantial progress has been made in the discovery of alleles contributing to the risk of coronary artery disease. In addition to providing causal insights into disease, these endeavours have yielded and enabled the refinement of polygenic risk scores. These scores can be used to predict incident coronary artery disease in multiple cohorts and indicate the clinical response to some preventive therapies in post hoc analyses of clinical trials. These observations and the widespread ability to calculate polygenic risk scores from direct-to-consumer and health-care-associated biobanks have raised many questions about responsible clinical adoption. In this Review, we describe technical and downstream considerations for the derivation and validation of polygenic risk scores and current evidence for their efficacy and safety. We discuss the implementation of these scores in clinical medicine for uses including risk prediction and screening algorithms for coronary artery disease, prioritization of patient subgroups that are likely to derive benefit from treatment, and efficient prospective clinical trial designs.
Collapse
|
21
|
Ma Y, Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet 2021; 37:995-1011. [PMID: 34243982 PMCID: PMC8511058 DOI: 10.1016/j.tig.2021.06.004] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 01/03/2023]
Abstract
Accurate genetic prediction of complex traits can facilitate disease screening, improve early intervention, and aid in the development of personalized medicine. Genetic prediction of complex traits requires the development of statistical methods that can properly model polygenic architecture and construct a polygenic score (PGS). We present a comprehensive review of 46 methods for PGS construction. We connect the majority of these methods through a multiple linear regression framework which can be instrumental for understanding their prediction performance for traits with distinct genetic architectures. We discuss the practical considerations of PGS analysis as well as challenges and future directions of PGS method development. We hope our review serves as a useful reference both for statistical geneticists who develop PGS methods and for data analysts who perform PGS analysis.
Collapse
Affiliation(s)
- Ying Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|