26
|
Fuady AM, Lent S, Sarnowski C, Tintle NL. Application of novel and existing methods to identify genes with evidence of epigenetic association: results from GAW20. BMC Genet 2018; 19:72. [PMID: 30255777 PMCID: PMC6157126 DOI: 10.1186/s12863-018-0647-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The rise in popularity and accessibility of DNA methylation data to evaluate epigenetic associations with disease has led to numerous methodological questions. As part of GAW20, our working group of 8 research groups focused on gene searching methods. RESULTS Although the methods were varied, we identified 3 main themes within our group. First, many groups tackled the question of how best to use pedigree information in downstream analyses, finding that (a) the use of kinship matrices is common practice, (b) ascertainment corrections may be necessary, and (c) pedigree information may be useful for identifying parent-of-origin effects. Second, many groups also considered multimarker versus single-marker tests. Multimarker tests had modestly improved power versus single-marker methods on simulated data, and on real data identified additional associations that were not identified with single-marker methods, including identification of a gene with a strong biological interpretation. Finally, some of the groups explored methods to combine single-nucleotide polymorphism (SNP) and DNA methylation into a single association analysis. CONCLUSIONS A causal inference method showed promise at discovering new mechanisms of SNP activity; gene-based methods of summarizing SNP and DNA methylation data also showed promise. Even though numerous questions still remain in the analysis of DNA methylation data, our discussions at GAW20 suggest some emerging best practices.
Collapse
|
27
|
Tintle NL, Fardo DW, de Andrade M, Aslibekyan S, Bailey JN, Bermejo JL, Cantor RM, Ghosh S, Melton P, Wang X, MacCluer JW, Almasy L. GAW20: methods and strategies for the new frontiers of epigenetics and pharmacogenomics. BMC Proc 2018; 12:26. [PMID: 30263042 PMCID: PMC6156831 DOI: 10.1186/s12919-018-0113-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
GAW20 provided a platform for developing and evaluating statistical methods to analyze human lipid-related phenotypes, DNA methylation, and single-nucleotide markers in a study involving a pharmaceutical intervention. In this article, we present an overview of the data sets and the contributions analyzing these data. The data, donated by the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) investigators, included data from 188 families (N = 1105) which included genome-wide DNA methylation data before and after a 3-week treatment with fenofibrate, single-nucleotide polymorphisms, metabolic syndrome components before and after treatment, and a variety of covariates. The contributions from individual research groups were extensively discussed prior, during, and after the Workshop in groups based on discussion themes, before being submitted for publication.
Collapse
|
28
|
Vander Woude J, Huisman J, Vander Berg L, Veenstra J, Bos A, Kalsbeek A, Koster K, Ryder N, Tintle NL. Evaluating the performance of gene-based tests of genetic association when testing for association between methylation and change in triglyceride levels at GAW20. BMC Proc 2018; 12:50. [PMID: 30275896 PMCID: PMC6157195 DOI: 10.1186/s12919-018-0124-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Although methylation data continues to rise in popularity, much is still unknown about how to best analyze methylation data in genome-wide analysis contexts. Given continuing interest in gene-based tests for next-generation sequencing data, we evaluated the performance of novel gene-based test statistics on simulated data from GAW20. Our analysis suggests that most of the gene-based tests are detecting real signals and maintaining the Type I error rate. The minimum p value and threshold-based tests performed well compared to single-marker tests in many cases, especially when the number of variants was relatively large with few true causal variants in the set.
Collapse
|
29
|
Abstract
Histopathology remains an important source of descriptive biological data in biomedical research. Recent petitions for enhanced reproducibility in scientific studies have elevated the role of tissue scoring (semiquantitative and quantitative) in research studies. Effective tissue scoring requires appropriate statistical analysis to help validate the group comparisons and give the pathologist confidence in interpreting the data. Each statistical test is typically founded on underlying assumptions regarding the data. If the underlying assumptions of a statistical test do not match the data, then these tests can lead to increased risk of erroneous interpretations of the data. The choice of appropriate statistical test is influenced by the study's experimental design and resultant data (eg, paired vs unpaired, normality, number of groups, etc). Here, we identify 3 common pitfalls in the analysis of tissue scores: shopping for significance, overuse of paired t-tests, and misguided analysis of multiple groups. Finally, we encourage pathologists to use the full breadth of resources available to them, such as using statistical software, reading key publications about statistical approaches, and identifying a statistician to serve as a collaborator on the multidisciplinary research team. These collective resources can be helpful in choosing the appropriate statistical test for tissue-scoring data to provide the most valid interpretation for the pathologist.
Collapse
|
30
|
Ryder N, Dorn KM, Huitsing M, Adams M, Ploegstra J, DeHaan L, Larson S, Tintle NL. Transcriptome assembly and annotation of johnsongrass ( Sorghum halepense) rhizomes identify candidate rhizome-specific genes. PLANT DIRECT 2018; 2:e00065. [PMID: 31245728 PMCID: PMC6508516 DOI: 10.1002/pld3.65] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 05/21/2018] [Accepted: 05/23/2018] [Indexed: 05/25/2023]
Abstract
Rhizomes facilitate the wintering and vegetative propagation of many perennial grasses. Sorghum halepense (johnsongrass) is an aggressive perennial grass that relies on a robust rhizome system to persist through winters and reproduce asexually from its rootstock nodes. This study aimed to sequence and assemble expressed transcripts within the johnsongrass rhizome. A de novo transcriptome assembly was generated from a single johnsongrass rhizome meristem tissue sample. A total of 141,176 probable protein-coding sequences from the assembly were identified and assigned gene ontology terms using Blast2GO. Estimated expression analysis and BLAST results were used to reduce the assembly to 64,447 high-confidence sequences. The johnsongrass assembly was compared to Sorghum bicolor, a related nonrhizomatous species, along with an assembly of similar rhizome tissue from the perennial grain crop Thinopyrum intermedium. The presence/absence analysis yielded a set of 98 expressed johnsongrass contigs that are likely associated with rhizome development.
Collapse
|
31
|
Harris WS, Tintle NL, Etherton MR, Vasan RS. Erythrocyte long-chain omega-3 fatty acid levels are inversely associated with mortality and with incident cardiovascular disease: The Framingham Heart Study. J Clin Lipidol 2018; 12:718-727.e6. [PMID: 29559306 PMCID: PMC6034629 DOI: 10.1016/j.jacl.2018.02.010] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 02/14/2018] [Accepted: 02/19/2018] [Indexed: 11/05/2022]
Abstract
BACKGROUND The extent to which omega-3 fatty acid status is related to risk for death from any cause and for incident cardiovascular disease (CVD) remains controversial. OBJECTIVE To examine these associations in the Framingham Heart Study. DESIGN Prospective and observational. SETTING Framingham Heart Study Offspring cohort. MEASUREMENTS The exposure marker was red blood cell levels of eicosapentaenoic and docosahexaenoic acids (the Omega-3 Index) measured at baseline. Outcomes included mortality (total, CVD, cancer, and other) and total CVD events in participants free of CVD at baseline. Follow-up was for a median of 7.3 years. Cox proportional hazards models were adjusted for 18 variables (demographic, clinical status, therapeutic, and CVD risk factors). RESULTS Among the 2500 participants (mean age 66 years, 54% women), there were 350 deaths (58 from CVD, 146 from cancer, 128 from other known causes, and 18 from unknown causes). There were 245 CVD events. In multivariable-adjusted analyses, a higher Omega-3 Index was associated with significantly lower risks (P-values for trends across quintiles) for total mortality (P = .02), for non-CVD and non-cancer mortality (P = .009), and for total CVD events (P = .008). Those in the highest (>6.8%) compared to those in the lowest Omega-3 Index quintiles (<4.2%) had a 34% lower risk for death from any cause and 39% lower risk for incident CVD. These associations were generally stronger for docosahexaenoic acid than for eicosapentaenoic acid. When total cholesterol was compared with the Omega-3 Index in the same models, the latter was significantly related with these outcomes, but the former was not. LIMITATIONS Relatively short follow-up time and one-time exposure assessment. CONCLUSIONS A higher Omega-3 Index was associated with reduced risk of both CVD and all-cause mortality.
Collapse
|
32
|
Kalsbeek A, Veenstra J, Westra J, Disselkoen C, Koch K, McKenzie KA, O’Bott J, Vander Woude J, Fischer K, Shearer GC, Harris WS, Tintle NL. A genome-wide association study of red-blood cell fatty acids and ratios incorporating dietary covariates: Framingham Heart Study Offspring Cohort. PLoS One 2018; 13:e0194882. [PMID: 29652918 PMCID: PMC5898718 DOI: 10.1371/journal.pone.0194882] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 03/12/2018] [Indexed: 02/07/2023] Open
Abstract
Recent analyses have suggested a strong heritable component to circulating fatty acid (FA) levels; however, only a limited number of genes have been identified which associate with FA levels. In order to expand upon a previous genome wide association study done on participants in the Framingham Heart Study Offspring Cohort and FA levels, we used data from 2,400 of these individuals for whom red blood cell FA profiles, dietary information and genotypes are available, and then conducted a genome-wide evaluation of potential genetic variants associated with 22 FAs and 15 FA ratios, after adjusting for relevant dietary covariates. Our analysis found nine previously identified loci associated with FA levels (FADS, ELOVL2, PCOLCE2, LPCAT3, AGPAT4, NTAN1/PDXDC1, PKD2L1, HBS1L/MYB and RAB3GAP1/MCM6), while identifying four novel loci. The latter include an association between variants in CALN1 (Chromosome 7) and eicosapentaenoic acid (EPA), DHRS4L2 (Chromosome 14) and a FA ratio measuring delta-9-desaturase activity, as well as two loci associated with less well understood proteins. Thus, the inclusion of dietary covariates had a modest impact, helping to uncover four additional loci. While genome-wide association studies continue to uncover additional genes associated with circulating FA levels, much of the heritable risk is yet to be explained, suggesting the potential role of rare genetic variation, epistasis and gene-environment interactions on FA levels as well. Further studies are needed to continue to understand the complex genetic picture of FA metabolism and synthesis.
Collapse
|
33
|
Bolt MA, Helming LM, Tintle NL. The Associations between Self-Reported Exposure to the Chernobyl Nuclear Disaster Zone and Mental Health Disorders in Ukraine. Front Psychiatry 2018; 9:32. [PMID: 29497388 PMCID: PMC5818457 DOI: 10.3389/fpsyt.2018.00032] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 01/26/2018] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND In 1986, Reactor 4 of the Chernobyl nuclear power plant near Pripyat, Ukraine exploded, releasing highly-radioactive materials into the surrounding environment. Although the physical effects of the disaster have been well-documented, a limited amount of research has been conducted on association of the disaster with long-term, clinically-diagnosable mental health disorders. According to the diathesis-stress model, the stress of potential and unknown exposure to radioactive materials and the ensuing changes to ones life or environment due to the disaster might lead those with previous vulnerabilities to fall into a poor state of mental health. Previous studies of this disaster have found elevated symptoms of stress, substance abuse, anxiety, and depression in exposed populations, though often at a subclinical level. MATERIALS AND METHODS With data from The World Mental Health Composite International Diagnostic Interview, a cross-sectional large mental health survey conducted in Ukraine by the World Health Organization, the mental health of Ukrainians was modeled with multivariable logistic regression techniques to determine if any long-term mental health disorders were association with reporting having lived in the zone affected by the Chernobyl nuclear disaster. Common classes of psychiatric disorders were examined as well as self-report ratings of physical and mental health. RESULTS Reporting that one lived in the Chernobyl-affected disaster zone was associated with a higher rate of alcohol disorders among men and higher rates of intermittent explosive disorders among women in a prevalence model. Subjects who lived in the disaster zone also had lower ratings of personal physical and mental health when compared to controls. DISCUSSION Stress resulting from disaster exposure, whether or not such exposure actually occurred or was merely feared, and ensuing changes in life circumstances is associated with increased rates of mental health disorders. Professionals assisting populations that are coping with the consequences of disaster should be aware of possible increases in psychiatric disorders as well as poorer perceptions regarding personal physical and mental health.
Collapse
|
34
|
Harris WS, Del Gobbo L, Tintle NL. The Omega-3 Index and relative risk for coronary heart disease mortality: Estimation from 10 cohort studies. Atherosclerosis 2017; 262:51-54. [DOI: 10.1016/j.atherosclerosis.2017.05.007] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 04/23/2017] [Accepted: 05/05/2017] [Indexed: 12/15/2022]
|
35
|
Faria JP, Davis JJ, Edirisinghe JN, Taylor RC, Weisenhorn P, Olson RD, Stevens RL, Rocha M, Rocha I, Best AA, DeJongh M, Tintle NL, Parrello B, Overbeek R, Henry CS. Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation. Front Microbiol 2016; 7:1819. [PMID: 27933038 PMCID: PMC5121216 DOI: 10.3389/fmicb.2016.01819] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 10/28/2016] [Indexed: 01/13/2023] Open
Abstract
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain.
Collapse
|
36
|
Dunn SL, Dunn LM, Buursma MP, Clark JA, Vander Berg L, DeVon HA, Tintle NL. Home- and Hospital-Based Cardiac Rehabilitation Exercise: The Important Role of Physician Recommendation. West J Nurs Res 2016; 39:214-233. [PMID: 27590042 DOI: 10.1177/0193945916668326] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Exercise reduces morbidity and mortality for patients with heart disease. Despite clear guidelines and known benefits, most cardiac patients do not meet current exercise recommendations. Physician endorsement positively affects patient participation in hospital-based Phase II cardiac rehabilitation programs, yet the importance of physician recommendation for home-based cardiac rehabilitation exercise is unknown. A prospective observational design was used to examine predictors of both home-based and Phase II rehabilitation exercise in a sample of 251 patients with coronary heart disease. Regression analyses were done to examine demographic and clinical characteristics, physical functioning, and patient's report of physician recommendation for exercise. Patients with a strong physician referral, who were married and older, were more likely to participate in Phase II exercise. Increased strength of physician recommendation was the unique predictor of home-based exercise. Further research is needed to examine how health professionals can motivate cardiac patients to exercise in home and outpatient settings.
Collapse
|
37
|
Powers S, DeJongh M, Best AA, Tintle NL. Cautions about the reliability of pairwise gene correlations based on expression data. Front Microbiol 2015; 6:650. [PMID: 26167162 PMCID: PMC4481165 DOI: 10.3389/fmicb.2015.00650] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 06/15/2015] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Rapid growth in the availability of genome-wide transcript abundance levels through gene expression microarrays and RNAseq promises to provide deep biological insights into the complex, genome-wide transcriptional behavior of single-celled organisms. However, this promise has not yet been fully realized. RESULTS We find that computation of pairwise gene associations (correlation; mutual information) across a set of 2782 total genome-wide expression samples from six diverse bacteria produces unexpectedly large variation in estimates of pairwise gene association-regardless of the metric used, the organism under study, or the number and source of the samples. We pinpoint the cause to sampling bias. In particular, in repositories of expression data (e.g., Gene Expression Omnibus, GEO), many individual genes show small differences in absolute gene expression levels across the set of samples. We demonstrate that these small differences are due mainly to "noise" instead of "signal" attributable to environmental or genetic perturbations. We show that downstream analysis using gene expression levels of genes with small differences yields biased estimates of pairwise association. CONCLUSIONS We propose flagging genes with small differences in absolute, RMA-normalized, expression levels (e.g., standard deviation less than 0.5), as potentially yielding biased pairwise association metrics. This strategy has the potential to substantially improve the confidence in genome-wide conclusions about transcriptional behavior in bacterial organisms. Further work is needed to further refine strategies to identify genes with small difference in expression levels prior to computing gene-gene association metrics.
Collapse
|
38
|
Tintle NL, Pottala JV, Lacey S, Ramachandran V, Westra J, Rogers A, Clark J, Olthoff B, Larson M, Harris W, Shearer GC. A genome-wide association study of saturated, mono- and polyunsaturated red blood cell fatty acids in the Framingham Heart Offspring Study. Prostaglandins Leukot Essent Fatty Acids 2015; 94:65-72. [PMID: 25500335 PMCID: PMC4339483 DOI: 10.1016/j.plefa.2014.11.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2014] [Revised: 11/14/2014] [Accepted: 11/17/2014] [Indexed: 01/06/2023]
Abstract
Most genome-wide association studies have explored relationships between genetic variants and plasma phospholipid fatty acid proportions, but few have examined apparent genetic influences on the membrane fatty acid profile of red blood cells (RBC). Using RBC fatty acid data from the Framingham Offspring Study, we analyzed over 2.5 million single nucleotide polymorphisms (SNPs) for association with 14 RBC fatty acids identifying 191 different SNPs associated with at least 1 fatty acid. Significant associations (p<1×10(-8)) were located within five distinct 1MB regions. Of particular interest were novel associations between (1) arachidonic acid and PCOLCE2 (regulates apoA-I maturation and modulates apoA-I levels), and (2) oleic and linoleic acid and LPCAT3 (mediates the transfer of fatty acids between glycerolipids). We also replicated previously identified strong associations between SNPs in the FADS (chromosome 11) and ELOVL (chromosome 6) regions. Multiple SNPs explained 8-14% of the variation in 3 high abundance (>11%) fatty acids, but only 1-3% in 4 low abundance (<3%) fatty acids, with the notable exception of dihomo-gamma linolenic acid with 53% of variance explained by SNPs. Further studies are needed to determine the extent to which variations in these genes influence tissue fatty acid content and pathways modulated by fatty acids.
Collapse
|
39
|
Blue EM, Sun L, Tintle NL, Wijsman EM. Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond. Genet Epidemiol 2014; 38 Suppl 1:S21-8. [PMID: 25112184 DOI: 10.1002/gepi.21821] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
When analyzing family data, we dream of perfectly informative data, even whole-genome sequences (WGSs) for all family members. Reality intervenes, and we find that next-generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome-wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single-nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule-based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models.
Collapse
|
40
|
Rogers A, Beck A, Tintle NL. Evaluating the concordance between sequencing, imputation and microarray genotype calls in the GAW18 data. BMC Proc 2014; 8:S22. [PMID: 25519374 PMCID: PMC4143748 DOI: 10.1186/1753-6561-8-s1-s22] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Genotype errors are well known to increase type I errors and/or decrease power in related tests of genotype-phenotype association, depending on whether the genotype error mechanism is associated with the phenotype. These relationships hold for both single and multimarker tests of genotype-phenotype association. To assess the potential for genotype errors in Genetic Analysis Workshop 18 (GAW18) data, where no gold standard genotype calls are available, we explored concordance rates between sequencing, imputation, and microarray genotype calls. Our analysis shows that missing data rates for sequenced individuals are high and that there is a modest amount of called genotype discordance between the 2 platforms, with discordance most common for lower minor allele frequency (MAF) single-nucleotide polymorphisms (SNPs). Some evidence for discordance rates that were different between phenotypes was observed, and we identified a number of cases where different technologies identified different bases at the variant site. Type I errors and power loss is possible as a result of missing genotypes and errors in called genotypes in downstream analysis of GAW18 data.
Collapse
|
41
|
Hainline A, Alvarez C, Luedtke A, Greco B, Beck A, Tintle NL. Evaluation of the power and type I error of recently proposed family-based tests of association for rare variants. BMC Proc 2014; 8:S36. [PMID: 25519321 PMCID: PMC4143711 DOI: 10.1186/1753-6561-8-s1-s36] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Until very recently, few methods existed to analyze rare-variant association with binary phenotypes in complex pedigrees. We consider a set of recently proposed methods applied to the simulated and real hypertension phenotype as part of the Genetic Analysis Workshop 18. Minimal power of the methods is observed for genes containing variants with weak effects on the phenotype. Application of the methods to the real hypertension phenotype yielded no genes meeting a strict Bonferroni cutoff of significance. Some prior literature connects 3 of the 5 most associated genes (p <1 × 10−4) to hypertension or related phenotypes. Further methodological development is needed to extend these methods to handle covariates, and to explore more powerful test alternatives.
Collapse
|
42
|
Bickeböller H, Bailey JN, Beyene J, Cantor RM, Cordell HJ, Culverhouse RC, Engelman CD, Fardo DW, Ghosh S, König IR, Lorenzo Bermejo J, Melton PE, Santorico SA, Satten GA, Sun L, Tintle NL, Ziegler A, MacCluer JW, Almasy L. Genetic Analysis Workshop 18: Methods and strategies for analyzing human sequence and phenotype data in members of extended pedigrees. BMC Proc 2014; 8:S1. [PMID: 25519310 PMCID: PMC4143625 DOI: 10.1186/1753-6561-8-s1-s1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Genetic Analysis Workshop 18 provided a platform for developing and evaluating statistical methods to analyze whole-genome sequence data from a pedigree-based sample. In this article we present an overview of the data sets and the contributions that analyzed these data. The family data, donated by the Type 2 Diabetes Genetic Exploration by Next-Generation Sequencing in Ethnic Samples Consortium, included sequence-level genotypes based on sequencing and imputation, genome-wide association genotypes from prior genotyping arrays, and phenotypes from longitudinal assessments. The contributions from individual research groups were extensively discussed before, during, and after the workshop in theme-based discussion groups before being submitted for publication.
Collapse
|
43
|
Greco B, Luedtke A, Hainline A, Alvarez C, Beck A, Tintle NL. Application of family-based tests of association for rare variants to pathways. BMC Proc 2014; 8:S105. [PMID: 25519359 PMCID: PMC4143675 DOI: 10.1186/1753-6561-8-s1-s105] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Pathway analysis approaches for sequence data typically either operate in a single stage (all variants within all genes in the pathway are combined into a single, very large set of variants that can then be analyzed using standard "gene-based" test statistics) or in 2-stages (gene-based p values are computed for all genes in the pathway, and then the gene-based p values are combined into a single pathway p value). To date, little consideration has been given to the performance of gene-based tests (typically designed for a smaller number of single-nucleotide variants [SNVs]) when the number of SNVs in the gene or in the pathway is very large and the genotypes come from sequence data organized in large pedigrees. We consider recently proposed gene-based tests for rare variants from complex pedigrees that test for association between a large set of SNVs and a qualitative phenotype of interest (1-stage analyses) as well as 2-stage approaches. We find that many of these methods show inflated type I errors when the number of SNVs in the gene or the pathway is large (>200 SNVs) and when using standard approaches to estimate the genotype covariance matrix. Alternative methods are needed when testing very large sets of SNVs in 1-stage approaches.
Collapse
|
44
|
Dunn SL, Olamijulo GB, Fuglseth HL, Holden TP, Swieringa LL, Sit MJ, Rieth NP, Tintle NL. The State–Trait Hopelessness Scale. West J Nurs Res 2013; 36:552-70. [DOI: 10.1177/0193945913507634] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Hopelessness is predictive in the development of coronary heart disease (CHD) and can persist in patients after a CHD event, adversely affecting recovery. Hopelessness may represent a temporary response (state) or a chronic outlook (trait). Common hopelessness measures fail to differentiate state from trait hopelessness, a potentially important differentiation for treatment. The State–Trait Hopelessness Scale (STHS) was developed and pilot tested with two groups of college students ( n = 39 and 190) and patients with CHD ( n = 44). The instrument was then used with 520 patients, confirming reliability (Cronbach’s α) for the State (.88) and Trait (.91) subscales and concurrent and predictive validity. Separate exploratory factor analyses showed two factors (hopelessness present or hopelessness absent) for the State and Trait subscales, accounting for 58.9% and 57.3% of variance, respectively. These findings support future use of the tool in clinical settings and in intervention studies focused on hopelessness.
Collapse
|
45
|
Petersen A, Alvarez C, DeClaire S, Tintle NL. Assessing methods for assigning SNPs to genes in gene-based tests of association using common variants. PLoS One 2013; 8:e62161. [PMID: 23741293 PMCID: PMC3669368 DOI: 10.1371/journal.pone.0062161] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2012] [Accepted: 03/18/2013] [Indexed: 11/18/2022] Open
Abstract
Gene-based tests of association are frequently applied to common SNPs (MAF>5%) as an alternative to single-marker tests. In this analysis we conduct a variety of simulation studies applied to five popular gene-based tests investigating general trends related to their performance in realistic situations. In particular, we focus on the impact of non-causal SNPs and a variety of LD structures on the behavior of these tests. Ultimately, we find that non-causal SNPs can significantly impact the power of all gene-based tests. On average, we find that the "noise" from 6-12 non-causal SNPs will cancel out the "signal" of one causal SNP across five popular gene-based tests. Furthermore, we find complex and differing behavior of the methods in the presence of LD within and between non-causal and causal SNPs. Ultimately, better approaches for a priori prioritization of potentially causal SNPs (e.g., predicting functionality of non-synonymous SNPs), application of these methods to sequenced or fully imputed datasets, and limited use of window-based methods for assigning inter-genic SNPs to genes will improve power. However, significant power loss from non-causal SNPs may remain unless alternative statistical approaches robust to the inclusion of non-causal SNPs are developed.
Collapse
|
46
|
Liu K, Fast S, Zawistowski M, Tintle NL. A geometric framework for evaluating rare variant tests of association. Genet Epidemiol 2013; 37:345-57. [PMID: 23526307 DOI: 10.1002/gepi.21722] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Revised: 02/12/2013] [Accepted: 02/13/2013] [Indexed: 11/08/2022]
Abstract
The wave of next-generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers.
Collapse
|
47
|
Tintle NL, Sitarik A, Boerema B, Young K, Best AA, Dejongh M. Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data. BMC Bioinformatics 2012; 13:193. [PMID: 22873695 PMCID: PMC3462729 DOI: 10.1186/1471-2105-13-193] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 07/19/2012] [Indexed: 01/13/2023] Open
Abstract
Background Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. Results We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Conclusions Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.
Collapse
|
48
|
Luedtke A, Powers S, Petersen A, Sitarik A, Bekmetjev A, Tintle NL. Evaluating methods for the analysis of rare variants in sequence data. BMC Proc 2011; 5 Suppl 9:S119. [PMID: 22373354 PMCID: PMC3287843 DOI: 10.1186/1753-6561-5-s9-s119] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
A number of rare variant statistical methods have been proposed for analysis of the impending wave of next-generation sequencing data. To date, there are few direct comparisons of these methods on real sequence data. Furthermore, there is a strong need for practical advice on the proper analytic strategies for rare variant analysis. We compare four recently proposed rare variant methods (combined multivariate and collapsing, weighted sum, proportion regression, and cumulative minor allele test) on simulated phenotype and next-generation sequencing data as part of Genetic Analysis Workshop 17. Overall, we find that all analyzed methods have serious practical limitations on identifying causal genes. Specifically, no method has more than a 5% true discovery rate (percentage of truly causal genes among all those identified as significantly associated with the phenotype). Further exploration shows that all methods suffer from inflated false-positive error rates (chance that a noncausal gene will be identified as associated with the phenotype) because of population stratification and gametic phase disequilibrium between noncausal SNPs and causal SNPs. Furthermore, observed true-positive rates (chance that a truly causal gene will be identified as significantly associated with the phenotype) for each of the four methods was very low (<19%). The combination of larger than anticipated false-positive rates, low true-positive rates, and only about 1% of all genes being causal yields poor discriminatory ability for all four methods. Gametic phase disequilibrium and population stratification are important areas for further research in the analysis of rare variant data.
Collapse
|
49
|
Petersen A, Sitarik A, Luedtke A, Powers S, Bekmetjev A, Tintle NL. Evaluating methods for combining rare variant data in pathway-based tests of genetic association. BMC Proc 2011; 5 Suppl 9:S48. [PMID: 22373429 PMCID: PMC3287885 DOI: 10.1186/1753-6561-5-s9-s48] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Analyzing sets of genes in genome-wide association studies is a relatively new approach that aims to capitalize on biological knowledge about the interactions of genes in biological pathways. This approach, called pathway analysis or gene set analysis, has not yet been applied to the analysis of rare variants. Applying pathway analysis to rare variants offers two competing approaches. In the first approach rare variant statistics are used to generate p-values for each gene (e.g., combined multivariate collapsing [CMC] or weighted-sum [WS]) and the gene-level p-values are combined using standard pathway analysis methods (e.g., gene set enrichment analysis or Fisher’s combined probability method). In the second approach, rare variant methods (e.g., CMC and WS) are applied directly to sets of single-nucleotide polymorphisms (SNPs) representing all SNPs within genes in a pathway. In this paper we use simulated phenotype and real next-generation sequencing data from Genetic Analysis Workshop 17 to analyze sets of rare variants using these two competing approaches. The initial results suggest substantial differences in the methods, with Fisher’s combined probability method and the direct application of the WS method yielding the best power. Evidence suggests that the WS method works well in most situations, although Fisher’s method was more likely to be optimal when the number of causal SNPs in the set was low but the risk of the causal SNPs was high.
Collapse
|
50
|
Tintle NL, Borchers B, Brown M, Bekmetjev A. Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16. BMC Proc 2009; 3 Suppl 7:S96. [PMID: 20018093 PMCID: PMC2796000 DOI: 10.1186/1753-6561-3-s7-s96] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recently, gene set analysis (GSA) has been extended from use on gene expression data to use on single-nucleotide polymorphism (SNP) data in genome-wide association studies. When GSA has been demonstrated on SNP data, two popular statistics from gene expression data analysis (gene set enrichment analysis [GSEA] and Fisher's exact test [FET]) have been used. However, GSEA and FET have shown a lack of power and robustness in the analysis of gene expression data. The purpose of this work is to investigate whether the same issues are also true for the analysis of SNP data. Ultimately, we conclude that GSEA and FET are not optimal for the analysis of SNP data when compared with the SUMSTAT method. In analysis of real SNP data from the Framingham Heart Study, we find that SUMSTAT finds many more gene sets to be significant when compared with other methods. In an analysis of simulated data, SUMSTAT demonstrates high power and better control of the type I error rate. GSA is a promising approach to the analysis of SNP data in GWAS and use of the SUMSTAT statistic instead of GSEA or FET may increase power and robustness.
Collapse
|