1
|
Koh H. A general kernel machine regression framework using principal component analysis for jointly testing main and interaction effects: Applications to human microbiome studies. NAR Genom Bioinform 2024; 6:lqae148. [PMID: 39534501 PMCID: PMC11555437 DOI: 10.1093/nargab/lqae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 09/27/2024] [Accepted: 10/18/2024] [Indexed: 11/16/2024] Open
Abstract
The effect of a treatment on a health or disease response can be modified by genetic or microbial variants. It is the matter of interaction effects between genetic or microbial variants and a treatment. To powerfully discover genetic or microbial biomarkers, it is crucial to incorporate such interaction effects in addition to the main effects. However, in the context of kernel machine regression analysis of its kind, existing methods cannot be utilized in a situation, where a kernel is available but its underlying real variants are unknown. To address such limitations, I introduce a general kernel machine regression framework using principal component analysis for jointly testing main and interaction effects. It begins with extracting principal components from an input kernel through the singular value decomposition. Then, it employs the principal components as surrogate variants to construct three endogenous kernels for the main effects, interaction effects, and both of them, respectively. Hence, it works with a kernel as an input without knowing its underlying real variants, and also detects either the main effects, interaction effects, or both of them robustly. I also introduce its omnibus testing extension to multiple input kernels, named OmniK. I demonstrate its use for human microbiome studies.
Collapse
Affiliation(s)
- Hyunwook Koh
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon 21985, South Korea
| |
Collapse
|
2
|
Seffernick AE, Cao X, Cheng C, Yang W, Autry RJ, Yang JJ, Pui CH, Teachey DT, Lamba JK, Mullighan CG, Pounds SB. Bootstrap Evaluation of Association Matrices (BEAM) for Integrating Multiple Omics Profiles with Multiple Outcomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.31.605805. [PMID: 39131398 PMCID: PMC11312528 DOI: 10.1101/2024.07.31.605805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Motivation Large datasets containing multiple clinical and omics measurements for each subject motivate the development of new statistical methods to integrate these data to advance scientific discovery. Model We propose bootstrap evaluation of association matrices (BEAM), which integrates multiple omics profiles with multiple clinical endpoints. BEAM associates a set omic features with clinical endpoints via regression models and then uses bootstrap resampling to determine statistical significance of the set. Unlike existing methods, BEAM uniquely accommodates an arbitrary number of omic profiles and endpoints. Results In simulations, BEAM performed similarly to the theoretically best simple test and outperformed other integrated analysis methods. In an example pediatric leukemia application, BEAM identified several genes with biological relevance established by a CRISPR assay that had been missed by univariate screens and other integrated analysis methods. Thus, BEAM is a powerful, flexible, and robust tool to identify genes for further laboratory and/or clinical research evaluation. Availability Source code, documentation, and a vignette for BEAM are available on GitHub at: https://github.com/annaSeffernick/BEAMR. The R package is available from CRAN at: https://cran.r-project.org/package=BEAMR. Contact Stanley.Pounds@stjude.org. Supplementary Information Supplementary data are available at the journal's website.
Collapse
Affiliation(s)
- Anna Eames Seffernick
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Xueyuan Cao
- Department of Health Promotion and Disease Prevention, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Cheng Cheng
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Wenjian Yang
- Department of Pharmacy & Pharmaceutical Services, St. Jude Children’s Research Hospital, Memphis, TN, USA
- Hematological Malignancies Program, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Robert J. Autry
- Hopp Children's Cancer Center Heidelberg (KiTZ), Heidelberg, Germany
- Division of Pediatric Neurooncology, German Consortium for Translational Cancer Research (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jun J. Yang
- Department of Pharmacy & Pharmaceutical Services, St. Jude Children’s Research Hospital, Memphis, TN, USA
- Hematological Malignancies Program, St. Jude Children’s Research Hospital, Memphis, TN, USA
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Ching-Hon Pui
- Hematological Malignancies Program, St. Jude Children’s Research Hospital, Memphis, TN, USA
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN, USA
- Department of Pathology, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - David T. Teachey
- Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
- Department of Pediatrics and the Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jatinder K. Lamba
- Department of Pharmacotherapy and Translational Research, University of Florida College of Pharmacy, Gainesville, FL, USA
| | - Charles G. Mullighan
- Hematological Malignancies Program, St. Jude Children’s Research Hospital, Memphis, TN, USA
- Department of Pathology, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Stanley B. Pounds
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN, USA
| |
Collapse
|
3
|
Liu M, Su YR, Liu Y, Hsu L, He Q. Structured testing of genetic association with mixed clinical outcomes. Genet Epidemiol 2024; 48:226-237. [PMID: 38606632 PMCID: PMC11470132 DOI: 10.1002/gepi.22560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 02/15/2024] [Accepted: 03/27/2024] [Indexed: 04/13/2024]
Abstract
Genetic factors play a fundamental role in disease development. Studying the genetic association with clinical outcomes is critical for understanding disease biology and devising novel treatment targets. However, the frequencies of genetic variations are often low, making it difficult to examine the variants one-by-one. Moreover, the clinical outcomes are complex, including patients' survival time and other binary or continuous outcomes such as recurrences and lymph node count, and how to effectively analyze genetic association with these outcomes remains unclear. In this article, we proposed a structured test statistic for testing genetic association with mixed types of survival, binary, and continuous outcomes. The structured testing incorporates known biological information of variants while allowing for their heterogeneous effects and is a powerful strategy for analyzing infrequent genetic factors. Simulation studies show that the proposed test statistic has correct type I error and is highly effective in detecting significant genetic variants. We applied our approach to a uterine corpus endometrial carcinoma study and identified several genetic pathways associated with the clinical outcomes.
Collapse
Affiliation(s)
- Meiling Liu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Yu-Ru Su
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Yang Liu
- Department of Mathematics and Statistics, Wright State University, Dayton, Ohio, USA
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Qianchuan He
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| |
Collapse
|
4
|
Grunin M, de Jong S, Palmer EL, Jin B, Rinker D, Moth C, Capra A, Haines JL, Bush WS, den Hollander AI. Spatial Distribution of Missense Variants within Complement Proteins Associates with Age Related Macular Degeneration. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.28.23294686. [PMID: 37693462 PMCID: PMC10491280 DOI: 10.1101/2023.08.28.23294686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Purpose Genetic variants in complement genes are associated with age-related macular degeneration (AMD). However, many rare variants have been identified in these genes, but have an unknown significance, and their impact on protein function and structure is still unknown. We set out to address this issue by evaluating the spatial placement and impact on protein structureof these variants by developing an analytical pipeline and applying it to the International AMD Genomics Consortium (IAMDGC) dataset (16,144 AMD cases, 17,832 controls). Methods The IAMDGC dataset was imputed using the Haplotype Reference Consortium (HRC), leading to an improvement of over 30% more imputed variants, over the original 1000 Genomes imputation. Variants were extracted for the CFH , CFI , CFB , C9 , and C3 genes, and filtered for missense variants in solved protein structures. We evaluated these variants as to their placement in the three-dimensional structure of the protein (i.e. spatial proximity in the protein), as well as AMD association. We applied several pipelines to a) calculate spatial proximity to known AMD variants versus gnomAD variants, b) assess a variant's likelihood of causing protein destabilization via calculation of predicted free energy change (ddG) using Rosetta, and c) whole gene-based testing to test for statistical associations. Gene-based testing using seqMeta was performed using a) all variants b) variants near known AMD variants or c) with a ddG >|2|. Further, we applied a structural kernel adaptation of SKAT testing (POKEMON) to confirm the association of spatial distributions of missense variants to AMD. Finally, we used logistic regression on known AMD variants in CFI to identify variants leading to >50% reduction in protein expression from known AMD patient carriers of CFI variants compared to wild type (as determined by in vitro experiments) to determine the pipeline's robustness in identifying AMD-relevant variants. These results were compared to functional impact scores, ie CADD values > 10, which indicate if a variant may have a large functional impact genomewide, to determine if our metrics have better discriminative power than existing variant assessment methods. Once our pipeline had been validated, we then performed a priori selection of variants using this pipeline methodology, and tested AMD patient cell lines that carried those selected variants from the EUGENDA cohort (n=34). We investigated complement pathway protein expression in vitro , looking at multiple components of the complement factor pathway in patient carriers of bioinformatically identified variants. Results Multiple variants were found with a ddG>|2| in each complement gene investigated. Gene-based tests using known and novel missense variants identified significant associations of the C3 , C9 , CFB , and CFH genes with AMD risk after controlling for age and sex (P=3.22×10 -5 ;7.58×10 -6 ;2.1×10 -3 ;1.2×10 -31 ). ddG filtering and SKAT-O tests indicate that missense variants that are predicted to destabilize the protein, in both CFI and CFH, are associated with AMD (P=CFH:0.05, CFI:0.01, threshold of 0.05 significance). Our structural kernel approach identified spatial associations for AMD risk within the protein structures for C3, C9, CFB, CFH, and CFI at a nominal p-value of 0.05. Both ddG and CADD scores were predictive of reduced CFI protein expression, with ROC curve analyses indicating ddG is a better predictor (AUCs of 0.76 and 0.69, respectively). A priori in vitro analysis of variants in all complement factor genes indicated that several variants identified via bioinformatics programs PathProx/POKEMON in our pipeline via in vitro experiments caused significant change in complement protein expression (P=0.04) in actual patient carriers of those variants, via ELISA testing of proteins in the complement factor pathway, and were previously unknown to contribute to AMD pathogenesis. Conclusion We demonstrate for the first time that missense variants in complement genes cluster together spatially and are associated with AMD case/control status. Using this method, we can identify CFI and CFH variants of previously unknown significance that are predicted to destabilize the proteins. These variants, both in and outside spatial clusters, can predict in-vitro tested CFI protein expression changes, and we hypothesize the same is true for CFH . A priori identification of variants that impact gene expression allow for classification for previously classified as VUS. Further investigation is needed to validate the models for additional variants and to be applied to all AMD-associated genes.
Collapse
|
5
|
Fu B, Pazokitoroudi A, Sudarshan M, Liu Z, Subramanian L, Sankararaman S. Fast kernel-based association testing of non-linear genetic effects for biobank-scale data. Nat Commun 2023; 14:4936. [PMID: 37582955 PMCID: PMC10427662 DOI: 10.1038/s41467-023-40346-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 07/18/2023] [Indexed: 08/17/2023] Open
Abstract
Our knowledge of non-linear genetic effects on complex traits remains limited, in part, due to the modest power to detect such effects. While kernel-based tests offer a versatile approach to test for non-linear relationships between sets of genetic variants and traits, current approaches cannot be applied to Biobank-scale datasets containing hundreds of thousands of individuals. We propose, FastKAST, a kernel-based approach that can test for non-linear effects of a set of variants on a quantitative trait. FastKAST provides calibrated hypothesis tests while enabling analysis of Biobank-scale datasets with hundreds of thousands of unrelated individuals from a homogeneous population. We apply FastKAST to 53 quantitative traits measured across ≈ 300 K unrelated white British individuals in the UK Biobank to detect sets of variants with non-linear effects at genome-wide significance.
Collapse
Affiliation(s)
- Boyang Fu
- Department of Computer Science, UCLA, Los Angeles, CA, USA.
| | | | - Mukund Sudarshan
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Zhengtong Liu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Lakshminarayanan Subramanian
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- Department of Population Health, NYU Grossman School of Medicine, New York, NY, USA
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
6
|
Sun R, Zhu L, Li Y, Yasui Y, Robison L. Inference for set-based effects in genetic association studies with interval-censored outcomes. Biometrics 2023; 79:1573-1585. [PMID: 35165890 PMCID: PMC9375811 DOI: 10.1111/biom.13636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 01/24/2022] [Accepted: 02/07/2022] [Indexed: 11/28/2022]
Abstract
The rapid acceleration of genetic data collection in biomedical settings has recently resulted in the rise of genetic compendiums filled with rich longitudinal disease data. One common feature of these data sets is their plethora of interval-censored outcomes. However, very few tools are available for the analysis of genetic data sets with interval-censored outcomes, and in particular, there is a lack of methodology available for set-based inference. Set-based inference is used to associate a gene, biological pathway, or other genetic construct with outcomes and is one of the most popular strategies in genetics research. This work develops three such tests for interval-censored settings beginning with a variance components test for interval-censored outcomes, the interval-censored sequence kernel association test (ICSKAT). We also provide the interval-censored version of the Burden test, and then we integrate ICSKAT and Burden to construct the interval censored sequence kernel association test-optimal (ICSKATO) combination. These tests unlock set-based analysis of interval-censored data sets with analogs of three highly popular set-based tools commonly applied to continuous and binary outcomes. Simulation studies illustrate the advantages of the developed methods over ad hoc alternatives, including protection of the type I error rate at very low levels and increased power. The proposed approaches are applied to the investigation that motivated this study, an examination of the genes associated with bone mineral density deficiency and fracture risk.
Collapse
Affiliation(s)
- Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, U.S.A
| | - Liang Zhu
- Division of Clinical and Translational Sciences, Department of Internal Medicine, University of Texas Health Science Center at Houston, Houston, Texas 77030, U.S.A
| | - Yimei Li
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105, U.S.A
| | - Yutaka Yasui
- Department of Epidemiology and Cancer Control, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105, U.S.A
| | - Leslie Robison
- Department of Epidemiology and Cancer Control, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105, U.S.A
| |
Collapse
|
7
|
Huang J, Zhao B, Weinstein SJ, Albanes D, Mondul AM. Metabolomic profile of prostate cancer-specific survival among 1812 Finnish men. BMC Med 2022; 20:362. [PMID: 36280842 PMCID: PMC9594924 DOI: 10.1186/s12916-022-02561-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 09/09/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Abnormal metabolism and perturbations in metabolic pathways play significant roles in the development and progression of prostate cancer; however, comprehensive metabolomic analyses of human data are lacking and needed to elucidate the interrelationships. METHODS We examined the serum metabolome in relation to prostate cancer survival in a cohort of 1812 cases in the Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study. Using an ultrahigh-performance LC-MS/MS platform, we identified 961 known metabolites in prospectively collected serum. Median survival time from diagnosis to prostate cancer-specific death (N=472) was 6.6 years (interquartile range=2.9-11.1 years). Cox proportional hazards regression models estimated hazard ratios and 95% confidence intervals of the associations between the serum metabolites (in quartiles) and prostate cancer death, adjusted for age at baseline and diagnosis, disease stage, and Gleason sum. In order to calculate risk scores, we first randomly divided the metabolomic data into a discovery set (70%) and validated in a replication set (30%). RESULTS Overall, 49 metabolites were associated with prostate cancer survival after Bonferroni correction. Notably, higher levels of the phospholipid choline, amino acid glutamate, long-chain polyunsaturated fatty acid (n6) arachidonate (20:4n6), and glutamyl amino acids gamma-glutamylglutamate, gamma-glutamylglycine, and gamma-glutamylleucine were associated with increased risk of prostate cancer-specific mortality (fourth versus first quartile HRs=2.07-2.14; P-values <5.2×10-5). By contrast, the ascorbate/aldarate metabolite oxalate, xenobiotics S-carboxymethyl-L-cysteine, fibrinogen cleavage peptides ADpSGEGDFXAEGGGVR and fibrinopeptide B (1-12) were related to reduced disease-specific mortality (fourth versus first quartile HRs=0.82-0.84; P-value <5.2×10-5). Further adjustment for years from blood collection to cancer diagnosis, body mass index, smoking intensity and duration, and serum total and high-density lipoprotein cholesterol did not alter the results. Participants with a higher metabolic score based on the discovery set had an elevated risk of prostate cancer-specific mortality in the replication set (fourth versus first quartile, HR=3.9, P-value for trend<0.0001). CONCLUSIONS The metabolic traits identified in this study, including for choline, glutamate, arachidonate, gamma-glutamyl amino acids, fibrinopeptides, and endocannabinoid and redox pathways and their composite risk score, corroborate our previous analysis of fatal prostate cancer and provide novel insights and potential leads regarding the molecular basis of prostate cancer progression and mortality.
Collapse
Affiliation(s)
- Jiaqi Huang
- National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology, Ministry of Education, and Department of Metabolism and Endocrinology, The Second Xiangya Hospital of Central South University, Changsha, 410011, Hunan, China.,Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Bin Zhao
- National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology, Ministry of Education, and Department of Metabolism and Endocrinology, The Second Xiangya Hospital of Central South University, Changsha, 410011, Hunan, China
| | - Stephanie J Weinstein
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Demetrius Albanes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
| | - Alison M Mondul
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
8
|
Chen W, Coombes BJ, Larson NB. Recent advances and challenges of rare variant association analysis in the biobank sequencing era. Front Genet 2022; 13:1014947. [PMID: 36276986 PMCID: PMC9582646 DOI: 10.3389/fgene.2022.1014947] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
Causal variants for rare genetic diseases are often rare in the general population. Rare variants may also contribute to common complex traits and can have much larger per-allele effect sizes than common variants, although power to detect these associations can be limited. Sequencing costs have steadily declined with technological advancements, making it feasible to adopt whole-exome and whole-genome profiling for large biobank-scale sample sizes. These large amounts of sequencing data provide both opportunities and challenges for rare-variant association analysis. Herein, we review the basic concepts of rare-variant analysis methods, the current state-of-the-art methods in utilizing variant annotations or external controls to improve the statistical power, and particular challenges facing rare variant analysis such as accounting for population structure, extremely unbalanced case-control design. We also review recent advances and challenges in rare variant analysis for familial sequencing data and for more complex phenotypes such as survival data. Finally, we discuss other potential directions for further methodology investigation.
Collapse
Affiliation(s)
- Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Brandon J. Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Nicholas B. Larson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| |
Collapse
|
9
|
Hu Y, Li Y, Satten GA, Hu YJ. Testing microbiome associations with survival times at both the community and individual taxon levels. PLoS Comput Biol 2022; 18:e1010509. [PMID: 36103548 PMCID: PMC9512219 DOI: 10.1371/journal.pcbi.1010509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 09/26/2022] [Accepted: 08/23/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Finding microbiome associations with possibly censored survival times is an important problem, especially as specific taxa could serve as biomarkers for disease prognosis or as targets for therapeutic interventions. The two existing methods for survival outcomes, MiRKAT-S and OMiSA, are restricted to testing associations at the community level and do not provide results at the individual taxon level. An ad hoc approach testing each taxon with a survival outcome using the Cox proportional hazard model may not perform well in the microbiome setting with sparse count data and small sample sizes. METHODS We have previously developed the linear decomposition model (LDM) for testing continuous or discrete outcomes that unifies community-level and taxon-level tests into one framework. Here we extend the LDM to test survival outcomes. We propose to use the Martingale residuals or the deviance residuals obtained from the Cox model as continuous covariates in the LDM. We further construct tests that combine the results of analyzing each set of residuals separately. Finally, we extend PERMANOVA, the most commonly used distance-based method for testing community-level hypotheses, to handle survival outcomes in a similar manner. RESULTS Using simulated data, we showed that the LDM-based tests preserved the false discovery rate for testing individual taxa and had good sensitivity. The LDM-based community-level tests and PERMANOVA-based tests had comparable or better power than MiRKAT-S and OMiSA. An analysis of data on the association of the gut microbiome and the time to acute graft-versus-host disease revealed several dozen associated taxa that would not have been achievable by any community-level test, as well as improved community-level tests by the LDM and PERMANOVA over those obtained using MiRKAT-S and OMiSA. CONCLUSIONS Unlike existing methods, our new methods are capable of discovering individual taxa that are associated with survival times, which could be of important use in clinical settings.
Collapse
Affiliation(s)
- Yingtian Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | - Yunxiao Li
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | - Glen A. Satten
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
10
|
Wang JH, Wang KH, Chen YH. Overlapping group screening for detection of gene-environment interactions with application to TCGA high-dimensional survival genomic data. BMC Bioinformatics 2022; 23:202. [PMID: 35637439 PMCID: PMC9150322 DOI: 10.1186/s12859-022-04750-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the context of biomedical and epidemiological research, gene-environment (G-E) interaction is of great significance to the etiology and progression of many complex diseases. In high-dimensional genetic data, two general models, marginal and joint models, are proposed to identify important interaction factors. Most existing approaches for identifying G-E interactions are limited owing to the lack of robustness to outliers/contamination in response and predictor data. In particular, right-censored survival outcomes make the associated feature screening even challenging. In this article, we utilize the overlapping group screening (OGS) approach to select important G-E interactions related to clinical survival outcomes by incorporating the gene pathway information under a joint modeling framework. RESULTS Simulation studies under various scenarios are carried out to compare the performances of our proposed method with some commonly used methods. In the real data applications, we use our proposed method to identify G-E interactions related to the clinical survival outcomes of patients with head and neck squamous cell carcinoma, and esophageal carcinoma in The Cancer Genome Atlas clinical survival genetic data, and further establish corresponding survival prediction models. Both simulation and real data studies show that our method performs well and outperforms existing methods in the G-E interaction selection, effect estimation, and survival prediction accuracy. CONCLUSIONS The OGS approach is useful for selecting important environmental factors, genes and G-E interactions in the ultra-high dimensional feature space. The prediction ability of OGS with the Lasso penalty is better than existing methods. The same idea of the OGS approach can apply to other outcome models, such as the proportional odds survival time model, the logistic regression model for binary outcomes, and the multinomial logistic regression model for multi-class outcomes.
Collapse
Affiliation(s)
- Jie-Huei Wang
- Department of Statistics, Feng Chia University, Seatwen, Taichung, 40724, Taiwan.
| | - Kang-Hsin Wang
- Department of Statistics, Feng Chia University, Seatwen, Taichung, 40724, Taiwan
| | - Yi-Hau Chen
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, 11529, Taiwan
| |
Collapse
|
11
|
Kawaguchi ES, Li G, Lewinger JP, Gauderman WJ. Two-step hypothesis testing to detect gene-environment interactions in a genome-wide scan with a survival endpoint. Stat Med 2022; 41:1644-1657. [PMID: 35075649 PMCID: PMC9007892 DOI: 10.1002/sim.9319] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 11/10/2021] [Accepted: 12/26/2021] [Indexed: 01/13/2023]
Abstract
Defined by their genetic profile, individuals may exhibit differential clinical outcomes due to an environmental exposure. Identifying subgroups based on specific exposure-modifying genes can lead to targeted interventions and focused studies. Genome-wide interaction scans (GWIS) can be performed to identify such genes, but these scans typically suffer from low power due to the large multiple testing burden. We provide a novel framework for powerful two-step hypothesis tests for GWIS with a time-to-event endpoint under the Cox proportional hazards model. In the Cox regression setting, we develop an approach that prioritizes genes for Step-2 G × E testing based on a carefully constructed Step-1 screening procedure. Simulation results demonstrate this two-step approach can lead to substantially higher power for identifying gene-environment ( G × E ) interactions compared to the standard GWIS while preserving the family wise error rate over a range of scenarios. In a taxane-anthracycline chemotherapy study for breast cancer patients, the two-step approach identifies several gene expression by treatment interactions that would not be detected using the standard GWIS.
Collapse
Affiliation(s)
- Eric S Kawaguchi
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, California, USA
| | - Gang Li
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, California, USA.,Department of Computational Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Juan Pablo Lewinger
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, California, USA
| | - W James Gauderman
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
12
|
Cheng S, Lyu J, Shi X, Wang K, Wang Z, Deng M, Sun B, Wang C. Rare variant association tests for ancestry-matched case-control data based on conditional logistic regression. Brief Bioinform 2022; 23:6502553. [PMID: 35021184 DOI: 10.1093/bib/bbab572] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/29/2021] [Accepted: 12/13/2021] [Indexed: 12/13/2022] Open
Abstract
With the increasing volume of human sequencing data available, analysis incorporating external controls becomes a popular and cost-effective approach to boost statistical power in disease association studies. To prevent spurious association due to population stratification, it is important to match the ancestry backgrounds of cases and controls. However, rare variant association tests based on a standard logistic regression model are conservative when all ancestry-matched strata have the same case-control ratio and might become anti-conservative when case-control ratio varies across strata. Under the conditional logistic regression (CLR) model, we propose a weighted burden test (CLR-Burden), a variance component test (CLR-SKAT) and a hybrid test (CLR-MiST). We show that the CLR model coupled with ancestry matching is a general approach to control for population stratification, regardless of the spatial distribution of disease risks. Through extensive simulation studies, we demonstrate that the CLR-based tests robustly control type 1 errors under different matching schemes and are more powerful than the standard Burden, SKAT and MiST tests. Furthermore, because CLR-based tests allow for different case-control ratios across strata, a full-matching scheme can be employed to efficiently utilize all available cases and controls to accelerate the discovery of disease associated genes.
Collapse
Affiliation(s)
- Shanshan Cheng
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Jingjing Lyu
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Xian Shi
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Kai Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Zengmiao Wang
- Center for Quantitative Biology, Peking University, Beijing 100871, P. R. China
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Beijing 100871, P. R. China.,LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, P. R. China.,Center for Statistical Sciences, Peking University, Beijing 100871, P. R. China
| | - Baoluo Sun
- Department of Statistics and Data Science, National University of Singapore, Singapore 117546, Singapore
| | - Chaolong Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China.,Department of Orthopedic Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| |
Collapse
|
13
|
Wu D, Li C, Lu Q. Multi-marker genetic association and interaction tests with interval-censored survival outcomes. Genet Epidemiol 2021; 45:860-873. [PMID: 34472134 PMCID: PMC8604754 DOI: 10.1002/gepi.22429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 07/13/2021] [Accepted: 08/12/2021] [Indexed: 11/06/2022]
Abstract
The development of set-based genetic-survival association tests has been focusing on right-censored survival outcomes. However, interval-censored failure time data arise widely from health science studies, especially those on the development of chronic diseases. In this paper, we proposed a suite of set-based genetic association and interaction tests for interval-censored survival outcomes under a unified weighted-V-statistic framework. Besides dealing with interval censoring, the new tests can account for genetic effect heterogeneity and accommodate left truncation of survival outcomes. Simulation studies showed that the new tests perform well in terms of size and power under various scenarios and that the new interaction test is more powerful than the standard likelihood ratio test for testing gene-gene/gene-environment interactions. The practical utility of the developed tests was illustrated by a genome-wide association study of age to early childhood caries.
Collapse
Affiliation(s)
- Di Wu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, USA
| | - Chenxi Li
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, USA
| | - Qing Lu
- Department of Biostatistics, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
14
|
Lakhal-Chaieb L, Simard J, Bull S. Sequence kernel association test for survival outcomes in the presence of a non-susceptible fraction. Biostatistics 2021; 21:518-530. [PMID: 30590388 DOI: 10.1093/biostatistics/kxy075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Revised: 10/23/2018] [Accepted: 10/25/2018] [Indexed: 11/13/2022] Open
Abstract
In this work, we propose a single nucleotide polymorphism set association test for survival phenotypes in the presence of a non-susceptible fraction. We consider a mixture model with a logistic regression for the susceptibility indicator and a proportional hazards regression to model survival in the susceptible group. We propose a joint test to assess the significance of the genetic variant in both logistic and survival regressions simultaneously. We adopt the spirit of SKAT and conduct a variance-component test treating the genetic effects of multiple variants as random. We derive score-type test statistics, and we investigate several approaches to compute their $p$-values. The finite-sample properties of the proposed tests are assessed and compared to existing approaches by simulations and their use is illustrated through an application to ovarian cancer data from the Consortium of Investigators of Modifiers of BRCA1 and BRCA2.
Collapse
Affiliation(s)
- Lajmi Lakhal-Chaieb
- Département de mathématiques et de statistique, Université Laval, 1045 de la médecine, Québec G1V 0A6, Canada
| | - Jacques Simard
- Département de médecine moléculaire, Chaire de recherche du Canada en encogénétique, Université Laval, Québec G1V 0A6, Canada
| | - Shelley Bull
- Dalla Lana School of Public Health, University of Toronto, 6th floor, Health Sciences Building, 155 College Street, Toronto, Ontario M5T3M7 Canada.,The Lunenberg-Tanenbaum Research Institute, Sinai Health System, 60 Murray Street, Toronto, Ontario M5T 3L9 Canada
| |
Collapse
|
15
|
Zhang B, Chiu CY, Yuan F, Sang T, Cook RJ, Wilson AF, Bailey-Wilson JE, Chew EY, Xiong M, Fan R. Gene-based analysis of bi-variate survival traits via functional regressions with applications to eye diseases. Genet Epidemiol 2021; 45:455-470. [PMID: 33645812 DOI: 10.1002/gepi.22381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 01/15/2021] [Accepted: 02/08/2021] [Indexed: 11/12/2022]
Abstract
Genetic studies of two related survival outcomes of a pleiotropic gene are commonly encountered but statistical models to analyze them are rarely developed. To analyze sequencing data, we propose mixed effect Cox proportional hazard models by functional regressions to perform gene-based joint association analysis of two survival traits motivated by our ongoing real studies. These models extend fixed effect Cox models of univariate survival traits by incorporating variations and correlation of multivariate survival traits into the models. The associations between genetic variants and two survival traits are tested by likelihood ratio test statistics. Extensive simulation studies suggest that type I error rates are well controlled and power performances are stable. The proposed models are applied to analyze bivariate survival traits of left and right eyes in the age-related macular degeneration progression.
Collapse
Affiliation(s)
- Bingsong Zhang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia, USA
| | - Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, Tennessee, USA.,Computational and Statistical Genomics Branch, National Human Genome, Research Institute, National Institutes of Health (NIH), Baltimore, Maryland, USA
| | - Fang Yuan
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Kunming Medical University, Kunming, People's Republic of China
| | - Tian Sang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia, USA.,School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, China
| | - Richard J Cook
- Department of Statistics and Actuarial Science, Waterloo, Ontario, Canada
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome, Research Institute, National Institutes of Health (NIH), Baltimore, Maryland, USA
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome, Research Institute, National Institutes of Health (NIH), Baltimore, Maryland, USA
| | - Emily Y Chew
- Division of Epidemiology and Clinical Applications, National Eye Institute, NIH, Bethesda, Maryland, USA
| | - Momiao Xiong
- Human Genetics Center, University of Texas-Houston, Houston, Texas, USA
| | - Ruzong Fan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia, USA.,Computational and Statistical Genomics Branch, National Human Genome, Research Institute, National Institutes of Health (NIH), Baltimore, Maryland, USA
| |
Collapse
|
16
|
Kachuri L, Helby J, Bojesen SE, Christiani DC, Su L, Wu X, Tardón A, Fernández-Tardón G, Field JK, Davies MP, Chen C, Goodman GE, Shepherd FA, Leighl NB, Tsao MS, Brhane Y, Brown MC, Boyd K, Shepshelovich D, Sun L, Amos CI, Liu G, Hung RJ. Investigation of Leukocyte Telomere Length and Genetic Variants in Chromosome 5p15.33 as Prognostic Markers in Lung Cancer. Cancer Epidemiol Biomarkers Prev 2020; 28:1228-1237. [PMID: 31263055 DOI: 10.1158/1055-9965.epi-18-1215] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Revised: 01/15/2019] [Accepted: 03/29/2019] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Lung cancer remains the leading cause of cancer mortality with relatively few prognostic biomarkers. We investigated associations with overall survival for telomere length (TL) and genetic variation in chromosome 5p15.33, an established telomere maintenance locus. METHODS Leukocyte TL was measured after diagnosis in 807 patients with non-small cell lung cancer (NSCLC) from the Princess Margaret Cancer Center in Toronto and assessed prospectively in 767 NSCLC cases from the Copenhagen City Heart Study and the Copenhagen General Population Study. Associations with all-cause mortality were tested for 723 variants in 5p15.33, genotyped in 4,672 NSCLC cases. RESULTS Short telomeres (≤10th percentile) were associated with poor prognosis for adenocarcinoma in both populations: TL measured 6 months after diagnosis [HR = 1.65; 95% confidence intervals (CI), 1.04-2.64] and for those diagnosed within 5 years after blood sampling (HR = 2.42; 95% CI, 1.37-4.28). Short TL was associated with mortality in never smokers with NSCLC (HR = 10.29; 95% CI, 1.86-56.86) and adenocarcinoma (HR = 11.31; 95% CI, 1.96-65.24). Analyses in 5p15.33 identified statistically significant prognostic associations for rs56266421-G in LPCAT1 (HR = 1.86; 95% CI, 1.38-2.52; P = 4.5 × 10-5) in stage I-IIIA NSCLC, and for the SLC6A3 gene with OS in females with NSCLC (P = 1.6 × 10-3). CONCLUSIONS Our findings support the potential clinical utility of TL, particularly for adenocarcinoma patients, while associations in chromosome 5p15.33 warrant further exploration. IMPACT This is the largest lung cancer study of leukocyte TL and OS, and the first to examine the impact of the timing of TL measurement. Our findings suggest that extremely short telomeres are indicative of poor prognosis in NSCLC.
Collapse
Affiliation(s)
- Linda Kachuri
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute of Sinai Health System, Toronto, Ontario, Canada.,Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, California
| | - Jens Helby
- Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Copenhagen, Denmark
| | - Stig Egil Bojesen
- Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Copenhagen, Denmark.,Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - David C Christiani
- Departments of Epidemiology and Environmental Health, Harvard TH Chan School of Public Health, Boston, Massachusetts.,Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts
| | - Li Su
- Departments of Epidemiology and Environmental Health, Harvard TH Chan School of Public Health, Boston, Massachusetts
| | - Xifeng Wu
- Department of Epidemiology, Division of Cancer Prevention and Population Sciences, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Adonina Tardón
- University of Oviedo and CIBERESP, Faculty of Medicine, Campus del Cristo, Oviedo, Spain
| | | | - John K Field
- Roy Castle Lung Cancer Research Programme, Institute of Translational Medicine, Department of Molecular & Clinical Cancer Medicine, University of Liverpool, Liverpool, United Kingdom
| | - Michael P Davies
- Roy Castle Lung Cancer Research Programme, Institute of Translational Medicine, Department of Molecular & Clinical Cancer Medicine, University of Liverpool, Liverpool, United Kingdom
| | - Chu Chen
- Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Gary E Goodman
- Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Frances A Shepherd
- Cancer Clinical Research Unit, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Natasha B Leighl
- Cancer Clinical Research Unit, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Ming S Tsao
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
| | - Yonathan Brhane
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute of Sinai Health System, Toronto, Ontario, Canada
| | - M Catherine Brown
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada
| | - Kevin Boyd
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada
| | - Daniel Shepshelovich
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada.,The Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Lei Sun
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.,Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Christopher I Amos
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, Texas
| | - Geoffrey Liu
- Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Rayjean J Hung
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute of Sinai Health System, Toronto, Ontario, Canada. .,Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
17
|
Li C, Wu D, Lu Q. Set-based genetic association and interaction tests for survival outcomes based on weighted V statistics. Genet Epidemiol 2020; 45:46-63. [PMID: 32896012 DOI: 10.1002/gepi.22353] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 08/03/2020] [Accepted: 08/03/2020] [Indexed: 01/07/2023]
Abstract
With advancements in high-throughout technologies, studies have been conducted to investigate the role of massive genetic variants in human diseases. While set-based tests have been developed for binary and continuous disease outcomes, there are few computationally efficient set-based tests available for time-to-event outcomes. To facilitate the genetic association and interaction analyses of time-to-event outcomes, We develop a suite of multivariant tests based on weighted V statistics with or without considering potential genetic heterogeneity. In addition to the computation efficiency and nice asymptotic properties, all the new tests can deal with left truncation and competing risks in the survival data, and adjust for covariates. Simulation studies show that the new tests run faster, are more accurate in small samples, and account for confounding effect better than the existing multivariant survival tests. When the genetic effect is heterogeneous across individuals/subpopulations, the association test considering genetic heterogeneity is more powerful than the existing tests that do not account for genetic heterogeneity. Using the new methods, we perform a genome-wide association analysis of the genotype and age-to-Alzheimer's data from the Rush Memory and Aging Project and the Religious Orders Study. The analysis identifies two genes, APOE and APOC1, associated with age to Alzheimer's disease onset.
Collapse
Affiliation(s)
- Chenxi Li
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, USA
| | - Di Wu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, USA
| | - Qing Lu
- Department of Biostatistics, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
18
|
Bi W, Fritsche LG, Mukherjee B, Kim S, Lee S. A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank. Am J Hum Genet 2020; 107:222-233. [PMID: 32589924 DOI: 10.1016/j.ajhg.2020.06.003] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 06/03/2020] [Indexed: 12/09/2022] Open
Abstract
With increasing biobanking efforts connecting electronic health records and national registries to germline genetics, the time-to-event data analysis has attracted increasing attention in the genetics studies of human diseases. In time-to-event data analysis, the Cox proportional hazards (PH) regression model is one of the most used approaches. However, existing methods and tools are not scalable when analyzing a large biobank with hundreds of thousands of samples and endpoints, and they are not accurate when testing low-frequency and rare variants. Here, we propose a scalable and accurate method, SPACox (a saddlepoint approximation implementation based on the Cox PH regression model), that is applicable for genome-wide scale time-to-event data analysis. SPACox requires fitting a Cox PH regression model only once across the genome-wide analysis and then uses a saddlepoint approximation (SPA) to calibrate the test statistics. Simulation studies show that SPACox is 76-252 times faster than other existing alternatives, such as gwasurvivr, 185-511 times faster than the standard Wald test, and more than 6,000 times faster than the Firth correction and can control type I error rates at the genome-wide significance level regardless of minor allele frequencies. Through the analysis of UK Biobank inpatient data of 282,871 white British European ancestry samples, we show that SPACox can efficiently analyze large sample sizes and accurately control type I error rates. We identified 611 loci associated with time-to-event phenotypes of 12 common diseases, of which 38 loci would be missed within a logistic regression framework with a binary phenotype defined as event occurrence status during the follow-up period.
Collapse
|
19
|
Fan CC, Banks SJ, Thompson WK, Chen CH, McEvoy LK, Tan CH, Kukull W, Bennett DA, Farrer LA, Mayeux R, Schellenberg GD, Andreassen OA, Desikan R, Dale AM. Sex-dependent autosomal effects on clinical progression of Alzheimer's disease. Brain 2020; 143:2272-2280. [PMID: 32591829 PMCID: PMC7364740 DOI: 10.1093/brain/awaa164] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 02/27/2020] [Accepted: 03/31/2020] [Indexed: 11/15/2022] Open
Abstract
Sex differences in the manifestations of Alzheimer's disease are under intense investigation. Despite the emerging importance of polygenic predictions for Alzheimer's disease, sex-dependent polygenic effects have not been demonstrated. Here, using a sex crossover analysis, we show that sex-dependent autosomal genetic effects on Alzheimer's disease can be revealed by characterizing disease progress via the hazard function. We first performed sex-stratified genome-wide associations, and then applied derived sex-dependent weights to two independent cohorts. Relative to sex-mismatched scores, sex-matched polygenic hazard scores showed significantly stronger associations with age-at-disease-onset, clinical progression, amyloid deposition, neurofibrillary tangles, and composite neuropathological scores, independent of apolipoprotein E. Models without using hazard weights, i.e. polygenic risk scores, showed lower predictive power than polygenic hazard scores with no evidence for sex differences. Our results indicate that revealing sex-dependent genetic architecture requires the consideration of temporal processes of Alzheimer's disease. This has strong implications not only for the genetic underpinning of Alzheimer's disease but also for how we estimate sex-dependent polygenic effects for clinical use.
Collapse
Affiliation(s)
- Chun Chieh Fan
- Center for Human Development, University of California, San Diego, USA
| | - Sarah J Banks
- Department of Neuroscience, University of California, San Diego, USA
| | - Wesley K Thompson
- Family Medicine and Public Health, University of California, San Diego, USA
| | - Chi-Hua Chen
- Department of Radiology, University of California, San Diego, USA
| | - Linda K McEvoy
- Family Medicine and Public Health, University of California, San Diego, USA
- Department of Radiology, University of California, San Diego, USA
| | - Chin Hong Tan
- Department of Psychology, Nanyang Technological University, Singapore
| | - Walter Kukull
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, USA
| | - David A Bennett
- Department of Neurological Science, Rush Medical College, Chicago, USA
| | | | - Richard Mayeux
- Department of Neurology and the Taub Institute at Columbia University, New York, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Ole A Andreassen
- Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Rahul Desikan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, USA
| | - Anders M Dale
- Department of Radiology, University of California, San Diego, USA
- Center for Multimodal Imaging and Genetics, University of California, San Diego, USA
| |
Collapse
|
20
|
Huang J, Weinstein SJ, Moore SC, Derkach A, Hua X, Mondul AM, Sampson JN, Albanes D. Pre-diagnostic Serum Metabolomic Profiling of Prostate Cancer Survival. J Gerontol A Biol Sci Med Sci 2020; 74:853-859. [PMID: 29878065 DOI: 10.1093/gerona/gly128] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Indexed: 12/13/2022] Open
Abstract
Impaired metabolism may play a role in the development and lethality of prostate cancer, yet a comprehensive analysis of the interrelationships appears lacking. We measured 625 metabolites using ultrahigh performance liquid chromatography/mass spectrometry (LC-MS) and gas chromatography/mass spectrometry (GC-MS) of prediagnostic serum from 197 prostate cancer cases in the Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study (ages at diagnosis, 55-86 years). Cox proportional hazards models estimated associations between circulating metabolites and prostate cancer mortality for 1 SD differences (log-metabolite scale), adjusted for age, year of diagnosis, and disease stage. Associations between metabolite chemical classes and survival were examined through pathway analysis, and Cox models assessed the relationship with a sterol/steroid metabolite principal component analysis factor score. Elevated serum N-oleoyl taurine was significantly associated with prostate cancer-specific mortality (hazard ratios [HR] = 1.72 per 1 SD, p < .00008, Bonferroni-corrected threshold = 0.05/625; HR = 3.6 for highest vs lowest tertile, p < .001). Pathway analyses revealed a statistically significant association between lipids and prostate cancer death (p < .006, Bonferroni-corrected threshold = 0.05/8), and sterol/steroid metabolites showed the strongest chemical sub-class association (p = .0014, Bonferroni-corrected threshold = 0.05/45). In the principal component analysis, a 1-SD increment in the sterol/steroid metabolite score increased the risk of prostate cancer death by 46%. Prediagnostic serum N-oleoyl taurine and sterol/steroid metabolites were associated with prostate cancer survival.
Collapse
Affiliation(s)
- Jiaqi Huang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Stephanie J Weinstein
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Steven C Moore
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Andriy Derkach
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Alison M Mondul
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor
| | - Joshua N Sampson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Demetrius Albanes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
21
|
Wei Y, Liu Y, Sun T, Chen W, Ding Y. Gene-based association analysis for bivariate time-to-event data through functional regression with copula models. Biometrics 2019; 76:619-629. [PMID: 31625595 DOI: 10.1111/biom.13165] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 10/08/2019] [Indexed: 11/28/2022]
Abstract
Several gene-based association tests for time-to-event traits have been proposed recently to detect whether a gene region (containing multiple variants), as a set, is associated with the survival outcome. However, for bivariate survival outcomes, to the best of our knowledge, there is no statistical method that can be directly applied for gene-based association analysis. Motivated by a genetic study to discover the gene regions associated with the progression of a bilateral eye disease, age-related macular degeneration (AMD), we implement a novel functional regression (FR) method under the copula framework. Specifically, the effects of variants within a gene region are modeled through a functional linear model, which then contributes to the marginal survival functions within the copula. Generalized score test statistics are derived to test for the association between bivariate survival traits and the genetic region. Extensive simulation studies are conducted to evaluate the type I error control and power performance of the proposed approach, with comparisons to several existing methods for a single survival trait, as well as the marginal Cox FR model using the robust sandwich estimator for bivariate survival traits. Finally, we apply our method to a large AMD study, the Age-related Eye Disease Study, and to identify the gene regions that are associated with AMD progression.
Collapse
Affiliation(s)
- Yue Wei
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Yi Liu
- Department of Biostatistics and Data Sciences, Boehringer Ingelheim, Ridgefield, Connecticut
| | - Tao Sun
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Wei Chen
- Department of Pediatrics, Children's Hospital of Pittsburgh, Pittsburgh, Pennsylvania
| | - Ying Ding
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
22
|
Chiu CY, Zhang B, Wang S, Shao J, Lakhal-Chaieb ML, Cook RJ, Wilson AF, Bailey-Wilson JE, Xiong M, Fan R. Gene-based association analysis of survival traits via functional regression-based mixed effect cox models for related samples. Genet Epidemiol 2019; 43:952-965. [PMID: 31502722 DOI: 10.1002/gepi.22254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 06/26/2019] [Accepted: 07/16/2019] [Indexed: 01/09/2023]
Abstract
The importance to integrate survival analysis into genetics and genomics is widely recognized, but only a small number of statisticians have produced relevant work toward this study direction. For unrelated population data, functional regression (FR) models have been developed to test for association between a quantitative/dichotomous/survival trait and genetic variants in a gene region. In major gene association analysis, these models have higher power than sequence kernel association tests. In this paper, we extend this approach to analyze censored traits for family data or related samples using FR based mixed effect Cox models (FamCoxME). The FamCoxME model effect of major gene as fixed mean via functional data analysis techniques, the local gene or polygene variations or both as random, and the correlation of pedigree members by kinship coefficients or genetic relationship matrix or both. The association between the censored trait and the major gene is tested by likelihood ratio tests (FamCoxME FR LRT). Simulation results indicate that the LRT control the type I error rates accurately/conservatively and have good power levels when both local gene or polygene variations are modeled. The proposed methods were applied to analyze a breast cancer data set from the Consortium of Investigators of Modifiers of BRCA1 and BRCA2 (CIMBA). The FamCoxME provides a new tool for gene-based analysis of family-based studies or related samples.
Collapse
Affiliation(s)
- Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, Tennessee
| | - Bingsong Zhang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Shuqi Wang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Jingyi Shao
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | | | - Richard J Cook
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| | - Momiao Xiong
- Department of Biostatistics, Human Genetics Center, University of Texas-Houston, Houston, Texas
| | - Ruzong Fan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| |
Collapse
|
23
|
Yan Q, Fang Z, Chen W. KMgene: a unified R package for gene-based association analysis for complex traits. Bioinformatics 2019; 34:2144-2146. [PMID: 29438558 PMCID: PMC6246171 DOI: 10.1093/bioinformatics/bty066] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 02/08/2018] [Indexed: 11/29/2022] Open
Abstract
Summary In this report, we introduce an R package KMgene for performing gene-based association
tests for familial, multivariate or longitudinal traits using kernel machine (KM)
regression under a generalized linear mixed model framework. Extensive simulations were
performed to evaluate the validity of the approaches implemented in KMgene. Availability and implementation http://cran.r-project.org/web/packages/KMgene. Supplementary information Supplementary data are
available at Bioinformatics online.
Collapse
Affiliation(s)
- Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zhou Fang
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei Chen
- Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, USA.,Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
24
|
Wang L, Lee S, Qiao D, Cho MH, Silverman EK, Lange C, Won S. metaFARVAT: An Efficient Tool for Meta-Analysis of Family-Based, Case-Control, and Population-Based Rare Variant Association Studies. Front Genet 2019; 10:572. [PMID: 31275357 PMCID: PMC6593391 DOI: 10.3389/fgene.2019.00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 05/31/2019] [Indexed: 11/13/2022] Open
Abstract
Family-based designs have been shown to be powerful in detecting the significant rare variants associated with human diseases. However, very few significant results have been found owing to relatively small sample sizes and the fact that statistical analyses often suffer from high false-negative error rates. These limitations can be avoided by combining results from multiple studies via meta-analysis. However, statistical methods for meta-analysis with rare variants are limited for family-based samples. In this report, we propose a tool for the meta-analysis of family-based rare variant associations, metaFARVAT. metaFARVAT is based on a quasi-likelihood score for each variant. These scores are combined to generate burden test, variable-threshold test, sequence kernel association test (SKAT), and optimal SKAT statistics. The proposed method tests homogeneous and heterogeneous effects of variants among different studies and can be applied to both quantitative and dichotomous phenotypes. Simulation results demonstrated the robustness and efficiency of the proposed method in different scenarios. By applying metaFARVAT to data from a family-based study and a case-control study, we identified a few promising candidate genes, including DLEC1, which is associated with chronic obstructive pulmonary disease.
Collapse
Affiliation(s)
- Longfei Wang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul, South Korea
| | - Dandi Qiao
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, United States
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, United States
| | - Christoph Lange
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Sungho Won
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea.,Department of Public Health Sciences, Seoul National University, Seoul, South Korea.,Institute of Health and Environment, Seoul National University, Seoul, South Korea
| |
Collapse
|
25
|
Qi W, Allen AS, Li YJ. Family-based association tests for rare variants with censored traits. PLoS One 2019; 14:e0210870. [PMID: 30682063 PMCID: PMC6347269 DOI: 10.1371/journal.pone.0210870] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Accepted: 12/27/2018] [Indexed: 11/30/2022] Open
Abstract
We propose a set of family-based burden and kernel tests for censored traits (FamBAC and FamKAC). Here, censored traits refer to time-to-event outcomes, for instance, age-at-onset of a disease. To model censored traits in family-based designs, we used the frailty model, which incorporated not only fixed genetic effects of rare variants in a region of interest but also random polygenic effects shared within families. We first partitioned genotype scores of rare variants into orthogonal between- and within-family components, and then derived their corresponding efficient score statistics from the frailty model. Finally, FamBAC and FamKAC were constructed by aggregating the weighted efficient scores of the within-family components across rare variants and subjects. FamBAC collapsed rare variants within subject first to form a burden test that followed a chi-squared distribution; whereas FamKAC was a variant component test following a mixture of chi-squared distributions. For FamKAC, p-values can be computed by permutation tests or for computational efficiency by approximation methods. Through simulation studies, we showed that type I error was correctly controlled by FamBAC for various variant weighting schemes (0.0371 to 0.0527). However, FamKAC type I error rates based on approximation methods were deflated (max 0.0376) but improved by permutation tests. Our simulations also demonstrated that burden test FamBAC had higher power than kernel test FamKAC when high proportion (e.g. ≥ 80%) of causal variants had effects in the same direction. In contrast, when the effects of causal variants on the censored trait were in mixed directions, FamKAC outperformed FamBAC and had comparable or higher power than an existing method, RVFam. Our proposed framework has the flexibility to accommodate general nuclear families, and can be used to analyze sequence data for censored traits such as age-at-onset of a complex disease of interest.
Collapse
Affiliation(s)
- Wenjing Qi
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America
- Duke Molecular Physiology Institute, Duke University, Durham, NC, United States of America
| | - Andrew S. Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America
- Center for Statistical Genetics and Genomics, Duke University, Durham, NC, United States of America
| | - Yi-Ju Li
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America
- Duke Molecular Physiology Institute, Duke University, Durham, NC, United States of America
- * E-mail:
| |
Collapse
|
26
|
Larson NB, Chen J, Schaid DJ. A review of kernel methods for genetic association studies. Genet Epidemiol 2019; 43:122-136. [PMID: 30604442 DOI: 10.1002/gepi.22180] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 11/09/2018] [Accepted: 11/26/2018] [Indexed: 12/17/2022]
Abstract
Evaluating the association of multiple genetic variants with a trait of interest by use of kernel-based methods has made a significant impact on how genetic association analyses are conducted. An advantage of kernel methods is that they tend to be robust when the genetic variants have effects that are a mixture of positive and negative effects, as well as when there is a small fraction of causal variants. Another advantage is that kernel methods fit within the framework of mixed models, providing flexible ways to adjust for additional covariates that influence traits. Herein, we review the basic ideas behind the use of kernel methods for genetic association analysis as well as recent methodological advancements for different types of traits, multivariate traits, pedigree data, and longitudinal data. Finally, we discuss opportunities for future research.
Collapse
Affiliation(s)
- Nicholas B Larson
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Jun Chen
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Daniel J Schaid
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
27
|
Wang JH, Chen YH. Overlapping group screening for detection of gene-gene interactions: application to gene expression profiles with survival trait. BMC Bioinformatics 2018; 19:335. [PMID: 30241463 PMCID: PMC6150983 DOI: 10.1186/s12859-018-2372-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 09/12/2018] [Indexed: 01/29/2023] Open
Abstract
Background The development of a disease is a complex process that may result from joint effects of multiple genes. In this article, we propose the overlapping group screening (OGS) approach to determining active genes and gene-gene interactions incorporating prior pathway information. The OGS method is developed to overcome the challenges in genome-wide data analysis that the number of the genes and gene-gene interactions is far greater than the sample size, and the pathways generally overlap with one another. The OGS method is further proposed for patients’ survival prediction based on gene expression data. Results Simulation studies demonstrate that the performance of the OGS approach in identifying the true main and interaction effects is good and the survival prediction accuracy of OGS with the Lasso penalty is better than the ordinary Lasso method. In real data analysis, we identify several significant genes and/or epistasis interactions that are associated with clinical survival outcomes of diffuse large B-cell lymphoma (DLBCL) and non-small-cell lung cancer (NSCLC) by utilizing prior pathway information from the KEGG pathway and the GO biological process databases, respectively. Conclusions The OGS approach is useful for selecting important genes and epistasis interactions in the ultra-high dimensional feature space. The prediction ability of OGS with the Lasso penalty is better than existing methods. The OGS approach is generally applicable to various types of outcome data (quantitative, qualitative, censored event time data) and regression models (e.g. linear, logistic, and Cox’s regression models). Electronic supplementary material The online version of this article (10.1186/s12859-018-2372-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jie-Huei Wang
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, Taiwan
| | - Yi-Hau Chen
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, Taiwan.
| |
Collapse
|
28
|
Chien LC, Chiu YF. General retrospective mega-analysis framework for rare variant association tests. Genet Epidemiol 2018; 42:621-635. [PMID: 30188589 DOI: 10.1002/gepi.22147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 06/05/2018] [Accepted: 06/05/2018] [Indexed: 11/09/2022]
Abstract
Here, we describe a retrospective mega-analysis framework for gene- or region-based multimarker rare variant association tests. Our proposed mega-analysis association tests allow investigators to combine longitudinal and cross-sectional family- and/or population-based studies. This framework can be applied to a continuous, categorical, or survival trait. In addition to autosomal variants, the tests can be applied to conduct mega-analyses on X-chromosome variants. Tests were built on study-specific region- or gene-level quasiscore statistics and, therefore, do not require estimates of effects of individual rare variants. We used the generalized estimating equation approach to account for complex multiple correlation structures between family members, repeated measurements, and genetic markers. While accounting for multilevel correlations and heterogeneity across studies, the test statistics were computationally efficient and feasible for large-scale sequencing studies. The retrospective aspect of association tests helps alleviate bias due to phenotype-related sampling and type I errors due to misspecification of phenotypic distribution. We evaluated our developed mega-analysis methods through comprehensive simulations with varying sample sizes, covariates, population stratification structures, and study designs across multiple studies. To illustrate application of the proposed framework, we conducted a mega-association analysis combining a longitudinal family study and a cross-sectional case-control study from Genetic Analysis Workshop 19.
Collapse
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC
| | - Yen-Feng Chiu
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan, ROC
| |
Collapse
|
29
|
Lumley T, Brody J, Peloso G, Morrison A, Rice K. FastSKAT: Sequence kernel association tests for very large sets of markers. Genet Epidemiol 2018; 42:516-527. [PMID: 29932245 PMCID: PMC6129408 DOI: 10.1002/gepi.22136] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 04/30/2018] [Accepted: 05/10/2018] [Indexed: 11/06/2022]
Abstract
The sequence kernel association test (SKAT) is widely used to test for associations between a phenotype and a set of genetic variants that are usually rare. Evaluating tail probabilities or quantiles of the null distribution for SKAT requires computing the eigenvalues of a matrix related to the genotype covariance between markers. Extracting the full set of eigenvalues of this matrix (an n × n matrix, for n subjects) has computational complexity proportional to n3 . As SKAT is often used when n > 10 4 , this step becomes a major bottleneck in its use in practice. We therefore propose fastSKAT, a new computationally inexpensive but accurate approximations to the tail probabilities, in which the k largest eigenvalues of a weighted genotype covariance matrix or the largest singular values of a weighted genotype matrix are extracted, and a single term based on the Satterthwaite approximation is used for the remaining eigenvalues. While the method is not particularly sensitive to the choice of k, we also describe how to choose its value, and show how fastSKAT can automatically alert users to the rare cases where the choice may affect results. As well as providing faster implementation of SKAT, the new method also enables entirely new applications of SKAT that were not possible before; we give examples grouping variants by topologically associating domains, and comparing chromosome-wide association by class of histone marker.
Collapse
Affiliation(s)
| | - Jennifer Brody
- Cardiovascular Health Research Unit, University of Washington
| | - Gina Peloso
- Department of Biostatistics, Boston University
| | | | - Kenneth Rice
- Department of Biostatistics, University of Washington
| |
Collapse
|
30
|
Huang J, Weinstein SJ, Moore SC, Derkach A, Hua X, Liao LM, Gu F, Mondul AM, Sampson JN, Albanes D. Serum Metabolomic Profiling of All-Cause Mortality: A Prospective Analysis in the Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study Cohort. Am J Epidemiol 2018; 187:1721-1732. [PMID: 29390044 DOI: 10.1093/aje/kwy017] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 01/23/2018] [Indexed: 12/12/2022] Open
Abstract
Tobacco use, hypertension, hyperglycemia, overweight, and inactivity are leading causes of overall and cardiovascular disease (CVD) mortality worldwide, yet the relevant metabolic alterations responsible are largely unknown. We conducted a serum metabolomic analysis of 620 men in the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (1985-2013). During 28 years of follow-up, there were 435 deaths (197 CVD and 107 cancer). The analysis included 406 known metabolites measured with ultra-high-performance liquid chromatography/mass spectrometry-gas chromatography/mass spectrometry. We used Cox regression to estimate mortality hazard ratios for a 1-standard-deviation difference in metabolite signals. The strongest associations with overall mortality were N-acetylvaline (hazard ratio (HR) = 1.28; P < 4.1 × 10-5, below Bonferroni statistical threshold) and dimethylglycine, 7-methylguanine, C-glycosyltryptophan, taurocholate, and N-acetyltryptophan (1.23 ≤ HR ≤ 1.32; 5 × 10-5 ≤ P ≤ 1 × 10-4). C-Glycosyltryptophan, 7-methylguanine, and 4-androsten-3β,17β-diol disulfate were statistically significantly associated with CVD mortality (1.49 ≤ HR ≤ 1.62, P < 4.1 × 10-5). No metabolite was associated with cancer mortality, at a false discovery rate of <0.1. Individuals with a 1-standard-deviation higher metabolite risk score had increased all-cause and CVD mortality in the test set (HR = 1.4, P = 0.05; HR = 1.8, P = 0.003, respectively). The several serum metabolites and their composite risk score independently associated with all-cause and CVD mortality may provide potential leads regarding the molecular basis of mortality.
Collapse
Affiliation(s)
- Jiaqi Huang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Stephanie J Weinstein
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Steven C Moore
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Andriy Derkach
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Linda M Liao
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Fangyi Gu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
- Department of Cancer Prevention and Control, Roswell Park Cancer Institute, Buffalo, New York
| | - Alison M Mondul
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, Michigan
| | - Joshua N Sampson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Demetrius Albanes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
31
|
Koh H, Livanos AE, Blaser MJ, Li H. A highly adaptive microbiome-based association test for survival traits. BMC Genomics 2018; 19:210. [PMID: 29558893 PMCID: PMC5859547 DOI: 10.1186/s12864-018-4599-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Accepted: 03/13/2018] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND There has been increasing interest in discovering microbial taxa that are associated with human health or disease, gathering momentum through the advances in next-generation sequencing technologies. Investigators have also increasingly employed prospective study designs to survey survival (i.e., time-to-event) outcomes, but current item-by-item statistical methods have limitations due to the unknown true association pattern. Here, we propose a new adaptive microbiome-based association test for survival outcomes, namely, optimal microbiome-based survival analysis (OMiSA). OMiSA approximates to the most powerful association test in two domains: 1) microbiome-based survival analysis using linear and non-linear bases of OTUs (MiSALN) which weighs rare, mid-abundant, and abundant OTUs, respectively, and 2) microbiome regression-based kernel association test for survival traits (MiRKAT-S) which incorporates different distance metrics (e.g., unique fraction (UniFrac) distance and Bray-Curtis dissimilarity), respectively. RESULTS We illustrate that OMiSA powerfully discovers microbial taxa whether their underlying associated lineages are rare or abundant and phylogenetically related or not. OMiSA is a semi-parametric method based on a variance-component score test and a re-sampling method; hence, it is free from any distributional assumption on the effect of microbial composition and advantageous to robustly control type I error rates. Our extensive simulations demonstrate the highly robust performance of OMiSA. We also present the use of OMiSA with real data applications. CONCLUSIONS OMiSA is attractive in practice as the true association pattern is unpredictable in advance and, for survival outcomes, no adaptive microbiome-based association test is currently available.
Collapse
Affiliation(s)
- Hyunwook Koh
- Department of Population Health, New York University School of Medicine, 650 First Avenue, Room 547, New York, NY 10016 USA
| | - Alexandra E. Livanos
- Department of Medicine, Columbia University Medical Center, New York, NY 10032 USA
| | - Martin J. Blaser
- Departments of Medicine and Microbiology, New York University School of Medicine, New York, NY 10016 USA
- Medical Service, New York Harbor Department of Veterans Affairs Medical Center, New York, NY 10010 USA
| | - Huilin Li
- Department of Population Health, New York University School of Medicine, 650 First Avenue, Room 547, New York, NY 10016 USA
| |
Collapse
|
32
|
Abstract
While genome-wide association studies have been very successful in identifying associations of common genetic variants with many different traits, the rarer frequency spectrum of the genome has not yet been comprehensively explored. Technological developments increasingly lift restrictions to access rare genetic variation. Dense reference panels enable improved genotype imputation for rarer variants in studies using DNA microarrays. Moreover, the decreasing cost of next generation sequencing makes whole exome and genome sequencing increasingly affordable for large samples. Large-scale efforts based on sequencing, such as ExAC, 100,000 Genomes, and TopMed, are likely to significantly advance this field.The main challenge in evaluating complex trait associations of rare variants is statistical power. The choice of population should be considered carefully because allele frequencies and linkage disequilibrium structure differ between populations. Genetically isolated populations can have favorable genomic characteristics for the study of rare variants.One strategy to increase power is to assess the combined effect of multiple rare variants within a region, known as aggregate testing. A range of methods have been developed for this. Model performance depends on the genetic architecture of the region of interest.
Collapse
Affiliation(s)
- Karoline Kuchenbaecker
- Wellcome Trust Sanger Institute, Cambridge, UK. .,University College London, London, UK.
| | - Emil Vincent Rosenbaum Appel
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section for Metabolic Genetics, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
33
|
Plantinga A, Zhan X, Zhao N, Chen J, Jenq RR, Wu MC. MiRKAT-S: a community-level test of association between the microbiota and survival times. MICROBIOME 2017; 5:17. [PMID: 28179014 PMCID: PMC5299808 DOI: 10.1186/s40168-017-0239-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 01/31/2017] [Indexed: 05/17/2023]
Abstract
BACKGROUND Community-level analysis of the human microbiota has culminated in the discovery of relationships between overall shifts in the microbiota and a wide range of diseases and conditions. However, existing work has primarily focused on analysis of relatively simple dichotomous or quantitative outcomes, for example, disease status or biomarker levels. Recently, there is also considerable interest in the relationship between the microbiota and censored survival outcomes, such as in clinical trials. How to conduct community-level analysis with censored survival outcomes is unclear, since standard dissimilarity-based tests cannot accommodate censored survival times and no alternative methods exist. METHODS We develop a new approach, MiRKAT-S, for community-level analysis of microbiome data with censored survival times. MiRKAT-S uses ecologically informative distance metrics, such as the UniFrac distances, to generate matrices of pairwise distances between individuals' taxonomic profiles. The distance matrices are transformed into kernel (similarity) matrices, which are used to compare similarity in the microbiota to similarity in survival times between individuals. RESULTS Simulation studies using synthetic microbial communities demonstrate correct control of type I error and adequate power. We also apply MiRKAT-S to examine the relationship between the gut microbiota and survival after allogeneic blood or bone marrow transplant. CONCLUSIONS We present MiRKAT-S, a method that facilitates community-level analysis of the association between the microbiota and survival outcomes and therefore provides a new approach to analysis of microbiome data arising from clinical trials.
Collapse
Affiliation(s)
- Anna Plantinga
- Department of Biostatistics, University of Washington, 1705 NE Pacific Street, Seattle, Washington USA
| | - Xiang Zhan
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, Washington USA
| | - Ni Zhao
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 615 N Wolfe St, Baltimore, Maryland USA
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, Minnesota USA
| | - Robert R. Jenq
- Departments of Genomic Medicine and Stem Cell Transplantation, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd, Houston, Unit 1954 TX USA
| | - Michael C. Wu
- Department of Biostatistics, University of Washington, 1705 NE Pacific Street, Seattle, Washington USA
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, Washington USA
| |
Collapse
|
34
|
Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models. Eur J Hum Genet 2016; 25:350-359. [PMID: 28000696 DOI: 10.1038/ejhg.2016.170] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 07/26/2016] [Accepted: 09/27/2016] [Indexed: 11/09/2022] Open
Abstract
To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data.
Collapse
|
35
|
A comprehensive study of the genetic impact of rare variants in SORL1 in European early-onset Alzheimer's disease. Acta Neuropathol 2016; 132:213-224. [PMID: 27026413 PMCID: PMC4947104 DOI: 10.1007/s00401-016-1566-9] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Revised: 03/15/2016] [Accepted: 03/16/2016] [Indexed: 12/14/2022]
Abstract
The sortilin-related receptor 1 (SORL1) gene has been associated with increased risk for Alzheimer’s disease (AD). Rare genetic variants in the SORL1 gene have also been implicated in autosomal dominant early-onset AD (EOAD). Here we report a large-scale investigation of the contribution of genetic variability in SORL1 to EOAD in a European EOAD cohort. We performed massive parallel amplicon-based re-sequencing of the full coding region of SORL1 in 1255 EOAD patients and 1938 age- and origin-matched control individuals in the context of the European Early-Onset Dementia (EOD) consortium, originating from Belgium, Spain, Portugal, Italy, Sweden, Germany, and Czech Republic. We identified six frameshift variants and two nonsense variants that were exclusively present in patients. These mutations are predicted to result in haploinsufficiency through nonsense-mediated mRNA decay, which could be confirmed experimentally for SORL1 p.Gly447Argfs*22 observed in a Belgian EOAD patient. We observed a 1.5-fold enrichment of rare non-synonymous variants in patients (carrier frequency 8.8 %; SkatOMeta p value 0.0001). Of the 84 non-synonymous rare variants detected in the full patient/control cohort, 36 were only detected in patients. Our findings underscore a role of rare SORL1 variants in EOAD, but also show a non-negligible frequency of these variants in healthy individuals, necessitating the need for pathogenicity assays. Premature stop codons due to frameshift and nonsense variants, have so far exclusively been found in patients, and their predicted mode of action corresponds with evidence from in vitro functional studies of SORL1 in AD.
Collapse
|
36
|
Fan R, Chiu CY, Jung J, Weeks DE, Wilson AF, Bailey-Wilson JE, Amos CI, Chen Z, Mills JL, Xiong M. A Comparison Study of Fixed and Mixed Effect Models for Gene Level Association Studies of Complex Traits. Genet Epidemiol 2016; 40:702-721. [PMID: 27374056 DOI: 10.1002/gepi.21984] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 03/08/2016] [Accepted: 04/26/2016] [Indexed: 12/22/2022]
Abstract
In association studies of complex traits, fixed-effect regression models are usually used to test for association between traits and major gene loci. In recent years, variance-component tests based on mixed models were developed for region-based genetic variant association tests. In the mixed models, the association is tested by a null hypothesis of zero variance via a sequence kernel association test (SKAT), its optimal unified test (SKAT-O), and a combined sum test of rare and common variant effect (SKAT-C). Although there are some comparison studies to evaluate the performance of mixed and fixed models, there is no systematic analysis to determine when the mixed models perform better and when the fixed models perform better. Here we evaluated, based on extensive simulations, the performance of the fixed and mixed model statistics, using genetic variants located in 3, 6, 9, 12, and 15 kb simulated regions. We compared the performance of three models: (i) mixed models that lead to SKAT, SKAT-O, and SKAT-C, (ii) traditional fixed-effect additive models, and (iii) fixed-effect functional regression models. To evaluate the type I error rates of the tests of fixed models, we generated genotype data by two methods: (i) using all variants, (ii) using only rare variants. We found that the fixed-effect tests accurately control or have low false positive rates. We performed simulation analyses to compare power for two scenarios: (i) all causal variants are rare, (ii) some causal variants are rare and some are common. Either one or both of the fixed-effect models performed better than or similar to the mixed models except when (1) the region sizes are 12 and 15 kb and (2) effect sizes are small. Therefore, the assumption of mixed models could be satisfied and SKAT/SKAT-O/SKAT-C could perform better if the number of causal variants is large and each causal variant contributes a small amount to the traits (i.e., polygenes). In major gene association studies, we argue that the fixed-effect models perform better or similarly to mixed models in most cases because some variants should affect the traits relatively large. In practice, it makes sense to perform analysis by both the fixed and mixed effect models and to make a comparison, and this can be readily done using our R codes and the SKAT packages.
Collapse
Affiliation(s)
- Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Chi-Yang Chiu
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Jeesun Jung
- Laboratory of Epidemiology and Biometry, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Daniel E Weeks
- Departments of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.,Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, United States of America
| | - Zhen Chen
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - James L Mills
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Momiao Xiong
- Human Genetics Center, University of Texas-Houston, Houston, Texas, United States of America
| |
Collapse
|
37
|
Discovery of rare variants for complex phenotypes. Hum Genet 2016; 135:625-34. [PMID: 27221085 DOI: 10.1007/s00439-016-1679-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 04/28/2016] [Indexed: 12/27/2022]
Abstract
With the rise of sequencing technologies, it is now feasible to assess the role rare variants play in the genetic contribution to complex trait variation. While some of the earlier targeted sequencing studies successfully identified rare variants of large effect, unbiased gene discovery using exome sequencing has experienced limited success for complex traits. Nevertheless, rare variant association studies have demonstrated that rare variants do contribute to phenotypic variability, but sample sizes will likely have to be even larger than those of common variant association studies to be powered for the detection of genes and loci. Large-scale sequencing efforts of tens of thousands of individuals, such as the UK10K Project and aggregation efforts such as the Exome Aggregation Consortium, have made great strides in advancing our knowledge of the landscape of rare variation, but there remain many considerations when studying rare variation in the context of complex traits. We discuss these considerations in this review, presenting a broad range of topics at a high level as an introduction to rare variant analysis in complex traits including the issues of power, study design, sample ascertainment, de novo variation, and statistical testing approaches. Ultimately, as sequencing costs continue to decline, larger sequencing studies will yield clearer insights into the biological consequence of rare mutations and may reveal which genes play a role in the etiology of complex traits.
Collapse
|
38
|
Abstract
Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.
Collapse
|
39
|
Winham SJ, Pirie A, Chen YA, Larson MC, Fogarty ZC, Earp MA, Anton-Culver H, Bandera EV, Cramer D, Doherty JA, Goodman MT, Gronwald J, Karlan BY, Kjaer SK, Levine DA, Menon U, Ness RB, Pearce CL, Pejovic T, Rossing MA, Wentzensen N, Bean YT, Bisogna M, Brinton LA, Carney ME, Cunningham JM, Cybulski C, deFazio A, Dicks EM, Edwards RP, Gayther SA, Gentry-Maharaj A, Gore M, Iversen ES, Jensen A, Johnatty SE, Lester J, Lin HY, Lissowska J, Lubinski J, Menkiszak J, Modugno F, Moysich KB, Orlow I, Pike MC, Ramus SJ, Song H, Terry KL, Thompson PJ, Tyrer JP, van den Berg DJ, Vierkant RA, Vitonis AF, Walsh C, Wilkens LR, Wu AH, Yang H, Ziogas A, Berchuck A, Chenevix-Trench G, Schildkraut JM, Permuth-Wey J, Phelan CM, Pharoah PDP, Fridley BL, Sellers TA, Goode EL. Investigation of Exomic Variants Associated with Overall Survival in Ovarian Cancer. Cancer Epidemiol Biomarkers Prev 2016; 25:446-54. [PMID: 26747452 PMCID: PMC4779669 DOI: 10.1158/1055-9965.epi-15-0240] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 11/19/2015] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND While numerous susceptibility loci for epithelial ovarian cancer (EOC) have been identified, few associations have been reported with overall survival. In the absence of common prognostic genetic markers, we hypothesize that rare coding variants may be associated with overall EOC survival and assessed their contribution in two exome-based genotyping projects of the Ovarian Cancer Association Consortium (OCAC). METHODS The primary patient set (Set 1) included 14 independent EOC studies (4,293 patients) and 227,892 variants, and a secondary patient set (Set 2) included six additional EOC studies (1,744 patients) and 114,620 variants. Because power to detect rare variants individually is reduced, gene-level tests were conducted. Sets were analyzed separately at individual variants and by gene, and then combined with meta-analyses (73,203 variants and 13,163 genes overlapped). RESULTS No individual variant reached genome-wide statistical significance. A SNP previously implicated to be associated with EOC risk and, to a lesser extent, survival, rs8170, showed the strongest evidence of association with survival and similar effect size estimates across sets (Pmeta = 1.1E-6, HRSet1 = 1.17, HRSet2 = 1.14). Rare variants in ATG2B, an autophagy gene important for apoptosis, were significantly associated with survival after multiple testing correction (Pmeta = 1.1E-6; Pcorrected = 0.01). CONCLUSIONS Common variant rs8170 and rare variants in ATG2B may be associated with EOC overall survival, although further study is needed. IMPACT This study represents the first exome-wide association study of EOC survival to include rare variant analyses, and suggests that complementary single variant and gene-level analyses in large studies are needed to identify rare variants that warrant follow-up study. Cancer Epidemiol Biomarkers Prev; 25(3); 446-54. ©2016 AACR.
Collapse
Affiliation(s)
- Stacey J Winham
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Ailith Pirie
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Yian Ann Chen
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Melissa C Larson
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Zachary C Fogarty
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Madalene A Earp
- Department of Health Sciences Research, Division of Epidemiology, Mayo Clinic, Rochester, Minnesota
| | - Hoda Anton-Culver
- Department of Epidemiology, University of California Irvine, Irvine, California
| | - Elisa V Bandera
- Rutgers Cancer Institute of New Jersey and Robert Wood Johnson Medical School, New Brunswick, New Jersey
| | - Daniel Cramer
- Obstetrics and Gynecology Epidemiology Center, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts. Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts
| | - Jennifer A Doherty
- Section of Biostatistics and Epidemiology, The Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire
| | - Marc T Goodman
- Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, California
| | - Jacek Gronwald
- Department of Genetics and Pathology, Pomeranian Medical University, Szczecin, Poland
| | - Beth Y Karlan
- Women's Cancer Program, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, California
| | - Susanne K Kjaer
- Virus, Lifestyle, and Genes, Danish Cancer Society Research Center, Copenhagen, Denmark. Department of Gynecology, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Douglas A Levine
- Gynecology Service, Department of Surgery, Memorial Sloan-Kettering Cancer Center, New York, New York
| | - Usha Menon
- Gynaecological Cancer Research Centre, Department of Women's Cancer, Institute for Women's Health, University College London, London, United Kingdom
| | - Roberta B Ness
- The University of Texas School of Public Health, Houston, Texas
| | - Celeste L Pearce
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Tanja Pejovic
- Department of Obstetrics and Gynecology, Oregon Health and Science University, Portland, Oregon. Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon
| | - Mary Anne Rossing
- Department of Epidemiology, University of Washington, Seattle, Washington. Program in Epidemiology, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | | | - Yukie T Bean
- Department of Obstetrics and Gynecology, Oregon Health and Science University, Portland, Oregon. Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon
| | - Maria Bisogna
- Gynecology Service, Department of Surgery, Memorial Sloan-Kettering Cancer Center, New York, New York
| | - Louise A Brinton
- Division of Cancer Epidemiology and Genetics, NCI, Bethesda, Maryland
| | - Michael E Carney
- Department of Obstectrics and Gynecology, John A. Burns School of Medicine, University of Hawaii, Honolulu, Hawaii
| | - Julie M Cunningham
- Department of Laboratory Medicine and Pathology, Division of Experimental Pathology, Mayo Clinic, Rochester, Minnesota
| | - Cezary Cybulski
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical Academy, Szczecin, Poland
| | - Anna deFazio
- Department of Gynaecological Oncology, Westmead Hospital, Sydney, Australia. Center for Cancer Research, University of Sydney at Westmead Millennium Institute, Sydney, Australia
| | - Ed M Dicks
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, United Kingdom
| | - Robert P Edwards
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Division of Gynecologic Oncology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Simon A Gayther
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Aleksandra Gentry-Maharaj
- Gynaecological Cancer Research Centre, Department of Women's Cancer, Institute for Women's Health, University College London, London, United Kingdom
| | - Martin Gore
- Gynecological Oncology Unit, The Royal Marsden Hospital, London, United Kingdom
| | - Edwin S Iversen
- Department of Statistical Science, Duke University, Durham, North Carolina
| | - Allan Jensen
- Virus, Lifestyle, and Genes, Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Sharon E Johnatty
- Department of Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Jenny Lester
- Women's Cancer Program, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, California
| | - Hui-Yi Lin
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Jolanta Lissowska
- Department of Cancer Epidemiology and Prevention, M. Sklodowska-Curie Memorial Cancer Center & Institute of Oncology, Warsaw, Poland
| | - Jan Lubinski
- Department of Genetics and Pathology, Pomeranian Medical University, Szczecin, Poland
| | - Janusz Menkiszak
- Department of Surgical Gynecology and Gynecological Oncology of Adults and Adolescents, Pomeranian Medical University, Szczecin, Poland
| | - Francesmary Modugno
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Division of Gynecologic Oncology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania. Department of Epidemiology, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania. Womens Cancer Research Program, Magee-Women's Research Institute and University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| | - Kirsten B Moysich
- Department of Cancer Prevention and Control, Roswell Park Cancer Institute, Buffalo, New York
| | - Irene Orlow
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York
| | - Malcolm C Pike
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California. Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York
| | - Susan J Ramus
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Honglin Song
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, United Kingdom
| | - Kathryn L Terry
- Obstetrics and Gynecology Epidemiology Center, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts. Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts
| | - Pamela J Thompson
- Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, California
| | - Jonathan P Tyrer
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, United Kingdom
| | - David J van den Berg
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Robert A Vierkant
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Allison F Vitonis
- Obstetrics and Gynecology Epidemiology Center, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Christine Walsh
- Women's Cancer Program, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, California
| | - Lynne R Wilkens
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii
| | - Anna H Wu
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Hannah Yang
- Division of Cancer Epidemiology and Genetics, NCI, Bethesda, Maryland
| | - Argyrios Ziogas
- Department of Epidemiology, Center for Cancer Genetics Research and Prevention, School of Medicine, University of California Irvine, Irvine, California
| | - Andrew Berchuck
- Duke Cancer Institute, Duke University Medical Center, Durham, North Carolina
| | | | | | - Jennifer Permuth-Wey
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Catherine M Phelan
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Paul D P Pharoah
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, United Kingdom
| | - Brooke L Fridley
- Kansas IDeA Network of Biomedical Research Excellence Bioinformatics Core, University of Kansas Cancer Center, Kansas City, Kansas
| | - Thomas A Sellers
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Ellen L Goode
- Department of Health Sciences Research, Division of Epidemiology, Mayo Clinic, Rochester, Minnesota.
| |
Collapse
|
40
|
Wu B, Pankow JS. Sequence Kernel Association Test of Multiple Continuous Phenotypes. Genet Epidemiol 2016; 40:91-100. [PMID: 26782911 PMCID: PMC4724299 DOI: 10.1002/gepi.21945] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Revised: 10/28/2015] [Accepted: 11/01/2015] [Indexed: 01/12/2023]
Abstract
Genetic studies often collect multiple correlated traits, which could be analyzed jointly to increase power by aggregating multiple weak effects and provide additional insights into the etiology of complex human diseases. Existing methods for multiple trait association tests have primarily focused on common variants. There is a surprising dearth of published methods for testing the association of rare variants with multiple correlated traits. In this paper, we extend the commonly used sequence kernel association test (SKAT) for single-trait analysis to test for the joint association of rare variant sets with multiple traits. We investigate the performance of the proposed method through extensive simulation studies. We further illustrate its usefulness with application to the analysis of diabetes-related traits in the Atherosclerosis Risk in Communities (ARIC) Study. We identified an exome-wide significant rare variant set in the gene YAP1 worthy of further investigations.
Collapse
Affiliation(s)
- Baolin Wu
- Division of Biostatistics, University of Minnesota
| | - James S. Pankow
- Division of Epidemiology and Community Health School of
Public Health, University of Minnesota
| |
Collapse
|
41
|
Zhang W, Li H, Li Z, Li Q. A two-phase procedure for non-normal quantitative trait genetic association study. BMC Bioinformatics 2016; 17:52. [PMID: 26821800 PMCID: PMC4730615 DOI: 10.1186/s12859-016-0888-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 01/06/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The nonparametric trend test (NPT) is well suitable for identifying the genetic variants associated with quantitative traits when the trait values do not satisfy the normal distribution assumption. If the genetic model, defined according to the mode of inheritance, is known, the NPT derived under the given genetic model is optimal. However, in practice, the genetic model is often unknown beforehand. The NPT derived from an uncorrected model might result in loss of power. When the underlying genetic model is unknown, a robust test is preferred to maintain satisfactory power. RESULTS We propose a two-phase procedure to handle the uncertainty of the genetic model for non-normal quantitative trait genetic association study. First, a model selection procedure is employed to help choose the genetic model. Then the optimal test derived under the selected model is constructed to test for possible association. To control the type I error rate, we derive the joint distribution of the test statistics developed in the two phases and obtain the proper size. CONCLUSIONS The proposed method is more robust than existing methods through the simulation results and application to gene DNAH9 from the Genetic Analysis Workshop 16 for associated with Anti-cyclic citrullinated peptide antibody further demonstrate its performance.
Collapse
Affiliation(s)
- Wei Zhang
- Key Laboratory of Systems Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Huiyun Li
- School of Management and Economics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Zhaohai Li
- Department of Statistics, George Washington University, Washington, 20052, DC, USA.
| | - Qizhai Li
- Key Laboratory of Systems Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.
| |
Collapse
|
42
|
Fan R, Wang Y, Yan Q, Ding Y, Weeks DE, Lu Z, Ren H, Cook RJ, Xiong M, Swaroop A, Chew EY, Chen W. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions. Genet Epidemiol 2016; 40:133-43. [PMID: 26782979 DOI: 10.1002/gepi.21947] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 10/13/2015] [Accepted: 11/05/2015] [Indexed: 11/07/2022]
Abstract
Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example.
Collapse
Affiliation(s)
- Ruzong Fan
- Division of Intramural Population Health Research, Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Yifan Wang
- Division of Intramural Population Health Research, Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Ying Ding
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Daniel E Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Zhaohui Lu
- Division of Intramural Population Health Research, Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Haobo Ren
- Regeneron Pharmaceuticals, Inc, Basking Ridge, New Jersey, United States of America
| | - Richard J Cook
- Department of Statistics and Actuarial Science, Waterloo, ON, Canada
| | - Momiao Xiong
- Human Genetics Center, University of Texas, Houston, Texas, United States of America
| | - Anand Swaroop
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, NIH, Bethesda, Maryland, United States of America
| | - Emily Y Chew
- Division of Epidemiology and Clinical Applications, National Eye Institute, NIH, Bethesda, Maryland, United States of America
| | - Wei Chen
- Division of Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
43
|
Wu B, Guan W, Pankow JS. On Efficient and Accurate Calculation of Significance P-Values for Sequence Kernel Association Testing of Variant Set. Ann Hum Genet 2016; 80:123-35. [PMID: 26757198 DOI: 10.1111/ahg.12144] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 11/12/2015] [Indexed: 01/04/2023]
Abstract
The objective of this paper is to discuss and develop alternative computational methods to accurately and efficiently calculate significance P-values for the commonly used sequence kernel association test (SKAT) and adaptive sum of SKAT and burden test (SKAT-O) for variant set association. We show that the existing software can lead to either conservative or inflated type I errors. We develop alternative and efficient computational algorithms that quickly compute the SKAT P-value and have well-controlled type I errors. In addition, we derive an alternative and simplified formula for calculating the significance P-value of SKAT-O, which sheds light on the development of efficient and accurate numerical algorithms. We implement the proposed methods in the publicly available R package that can be readily used or adapted to large-scale sequencing studies. Given that more and more large-scale exome and whole genome sequencing or re-sequencing studies are being conducted, the proposed methods are practically very important. We conduct extensive numerical studies to investigate the performance of the proposed methods. We further illustrate their usefulness with application to associations between rare exonic variants and fasting glucose levels in the Atherosclerosis Risk in Communities (ARIC) study.
Collapse
Affiliation(s)
- Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Weihua Guan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - James S Pankow
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
44
|
Chen MH, Yang Q. RVFam: an R package for rare variant association analysis with family data. Bioinformatics 2015; 32:624-6. [PMID: 26508760 DOI: 10.1093/bioinformatics/btv609] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 10/16/2015] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED Family-based designs offer unique advantage for identifying rare risk variants in genetic association studies. There are existing tools for analyzing rare variants in families but lacking components to handle binary traits properly and survival traits. In this report, we introduce an R software package RVFam (Rare Variant association analysis with Family data) designed to analyze continuous, binary and survival traits against rare and common sequencing variants in genome-wide association studies (GWAS) involving family data. Single and multiple variant association tests were implemented while accounting for arbitrary family structures. Extensive simulation studies were performed to evaluate all the approaches implemented in RVFam. AVAILABILITY AND IMPLEMENTATION http://cran.r-project.org/web/packages/RVFam/.
Collapse
Affiliation(s)
- Ming-Huei Chen
- Department of Neurology, Boston University School of Medicine, Boston, MA 02118, USA, Framingham Heart Study, Population Sciences Branch, Division of Intramural Research, National Heart Lung and Blood Institute, National Institutes of Health, Framingham, MA 01702, USA
| | - Qiong Yang
- Department of Neurology, Boston University School of Medicine, Boston, MA 02118, USA, Framingham Heart Study, Population Sciences Branch, Division of Intramural Research, National Heart Lung and Blood Institute, National Institutes of Health, Framingham, MA 01702, USA
| |
Collapse
|
45
|
Lakhal-Chaieb L, Oualkacha K, Richards BJ, Greenwood CM. A rare variant association test in family-based designs and non-normal quantitative traits. Stat Med 2015; 35:905-21. [DOI: 10.1002/sim.6750] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Revised: 09/04/2015] [Accepted: 09/05/2015] [Indexed: 12/13/2022]
Affiliation(s)
- Lajmi Lakhal-Chaieb
- Département de mathématiques et statistique; Université Laval; Québec G1V 0A6 Québec Canada
| | - Karim Oualkacha
- Département de mathématiques; Université de Québec À Montréal; Montreal Québec Canada
| | - Brent J. Richards
- Lady Davis Institute for Medical Research; Jewish General Hospital; Montreal Québec Canada
- Department of Epidemiology, Biostatistics and Occupational Health; McGill University; Montreal Québec Canada
- Department of Twin Research; King's College London; London U.K
| | - Celia M.T. Greenwood
- Lady Davis Institute for Medical Research; Jewish General Hospital; Montreal Québec Canada
- Department of Epidemiology, Biostatistics and Occupational Health; McGill University; Montreal Québec Canada
- Departments of Oncology and Human Genetics; McGill University; Montreal Québec Canada
| |
Collapse
|
46
|
Derkach A, Lawless JF, Sun L. Score tests for association under response-dependent sampling designs for expensive covariates. Biometrika 2015. [DOI: 10.1093/biomet/asv038] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
47
|
Wu B, Pankow JS, Guan W. Sequence Kernel Association Analysis of Rare Variant Set Based on the Marginal Regression Model for Binary Traits. Genet Epidemiol 2015; 39:399-405. [PMID: 26282996 PMCID: PMC4544778 DOI: 10.1002/gepi.21913] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2014] [Revised: 06/11/2015] [Accepted: 06/15/2015] [Indexed: 01/28/2023]
Abstract
Recent sequencing efforts have focused on exploring the influence of rare variants on the complex diseases. Gene level based tests by aggregating information across rare variants within a gene have become attractive to enrich the rare variant association signal. Among them, the sequence kernel association test (SKAT) has proved to be a very powerful method for jointly testing multiple rare variants within a gene. In this article, we explore an alternative SKAT. We propose to use the univariate likelihood ratio statistics from the marginal model for individual variants as input into the kernel association test. We show how to compute its significance P-value efficiently based on the asymptotic chi-square mixture distribution. We demonstrate through extensive numerical studies that the proposed method has competitive performance. Its usefulness is further illustrated with application to associations between rare exonic variants and type 2 diabetes (T2D) in the Atherosclerosis Risk in Communities (ARIC) study. We identified an exome-wide significant rare variant set in the gene ZZZ3 worthy of further investigations.
Collapse
Affiliation(s)
- Baolin Wu
- Division of Biostatistics, School of Public Health, University of
Minnesota
| | - James S. Pankow
- Division of Epidemiology and Community Health, School of Public
Health, University of Minnesota
| | - Weihua Guan
- Division of Biostatistics, School of Public Health, University of
Minnesota
| |
Collapse
|
48
|
Leclerc M, Simard J, Lakhal-Chaieb L. SNP Set Association Testing for Survival Outcomes in the Presence of Intrafamilial Correlation. Genet Epidemiol 2015; 39:406-14. [PMID: 26282997 DOI: 10.1002/gepi.21914] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Revised: 06/04/2015] [Accepted: 06/17/2015] [Indexed: 11/06/2022]
Abstract
In this work, we propose a single nucleotide polymorphism (SNP) set association test for censored phenotypes in the presence of a family-based design. The proposed test is valid for both common and rare variants. A proportional hazards Cox model is specified for the marginal distribution of the trait and the familial dependence is modeled via a Gaussian copula. Censored values are treated as partially missing data and a multiple imputation procedure is proposed in order to compute the test statistics. The P-value is then deduced analytically. The finite-sample empirical properties of the proposed method are evaluated and compared to existing competitors by simulations and its use is illustrated using a breast cancer data set from the Consortium of Investigators of Modifiers of BRCA1 and BRCA2.
Collapse
Affiliation(s)
- Martin Leclerc
- Département de mathématiques et de statistique, Université Laval, Québec, Canada
| | | | - Jacques Simard
- Department of Molecular Medicine, Canada Research Chair in Oncogenetics, Laval University & Genomics Centre, CHU de Québec Research Centre, Québec, Canada
| | - Lajmi Lakhal-Chaieb
- Département de mathématiques et de statistique, Université Laval, Québec, Canada
| |
Collapse
|
49
|
Meta-analysis for Discovering Rare-Variant Associations: Statistical Methods and Software Programs. Am J Hum Genet 2015; 97:35-53. [PMID: 26094574 DOI: 10.1016/j.ajhg.2015.05.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Accepted: 05/01/2015] [Indexed: 01/01/2023] Open
Abstract
There is heightened interest in using next-generation sequencing technologies to identify rare variants that influence complex human diseases and traits. Meta-analysis is essential to this endeavor because large sample sizes are required for detecting associations with rare variants. In this article, we provide a comprehensive overview of statistical methods for meta-analysis of sequencing studies for discovering rare-variant associations. Specifically, we discuss the calculation of relevant summary statistics from participating studies, the construction of gene-level association tests, the choice of transformation for quantitative traits, the use of fixed-effects versus random-effects models, and the removal of shadow association signals through conditional analysis. We also show that meta-analysis based on properly calculated summary statistics is as powerful as joint analysis of individual-participant data. In addition, we demonstrate the performance of different meta-analysis methods by using both simulated and empirical data. We then compare four major software packages for meta-analysis of rare-variant associations-MASS, RAREMETAL, MetaSKAT, and seqMeta-in terms of the underlying statistical methodology, analysis pipeline, and software interface. Finally, we present PreMeta, a software interface that integrates the four meta-analysis packages and allows a consortium to combine otherwise incompatible summary statistics.
Collapse
|
50
|
Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models. Genetics 2015; 200:1089-104. [PMID: 26058849 DOI: 10.1534/genetics.115.178343] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 06/05/2015] [Indexed: 11/18/2022] Open
Abstract
Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies.
Collapse
|