1
|
Vieland VJ, Seok SC. The PPLD has advantages over conventional regression methods in application to moderately sized genome-wide association studies. PLoS One 2021; 16:e0257164. [PMID: 34550985 PMCID: PMC8457474 DOI: 10.1371/journal.pone.0257164] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 08/24/2021] [Indexed: 11/18/2022] Open
Abstract
In earlier work, we have developed and evaluated an alternative approach to the analysis of GWAS data, based on a statistic called the PPLD. More recently, motivated by a GWAS for genetic modifiers of the X-linked Mendelian disorder Duchenne Muscular Dystrophy (DMD), we adapted the PPLD for application to time-to-event (TE) phenotypes. Because DMD itself is relatively rare, this is a setting in which the very large sample sizes generally assembled for GWAS are simply not attainable. For this reason, statistical methods specially adapted for use in small data sets are required. Here we explore the behavior of the TE-PPLD via simulations, comparing the TE-PPLD with Cox Proportional Hazards analysis in the context of small to moderate sample sizes. Our results will help to inform our approach to the DMD study going forward, and they illustrate several respects in which the TE-PPLD, and by extension the original PPLD, offer advantages over regression-based approaches to GWAS in this context.
Collapse
Affiliation(s)
- Veronica J. Vieland
- Battelle Center for Mathematical Medicine, Abigail Wexner Research Institute, Nationwide Children’s Hospital, Columbus, OH, United States of America
- Department of Pediatrics, The Ohio State University, Columbus, OH, United States of America
- Department of Statistics, The Ohio State University, Columbus, OH, United States of America
- * E-mail:
| | - Sang-Cheol Seok
- Battelle Center for Mathematical Medicine, Abigail Wexner Research Institute, Nationwide Children’s Hospital, Columbus, OH, United States of America
| |
Collapse
|
2
|
Vieland VJ, Seok SC, Stewart WCL. A new linear regression-like residual for survival analysis, with application to genome wide association studies of time-to-event data. PLoS One 2020; 15:e0232300. [PMID: 32365095 PMCID: PMC7197860 DOI: 10.1371/journal.pone.0232300] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 04/11/2020] [Indexed: 01/08/2023] Open
Abstract
In linear regression, a residual measures how far a subject's observation is from expectation; in survival analysis, a subject's Martingale or deviance residual is sometimes interpreted similarly. Here we consider ways in which a linear regression-like interpretation is not appropriate for Martingale and deviance residuals, and we develop a novel time-to-event residual which does have a linear regression-like interpretation. We illustrate the utility of this new residual via simulation of a time-to-event genome-wide association study, motivated by a real study seeking genetic modifiers of Duchenne Muscular Dystrophy. By virtue of its linear regression-like characteristics, our new residual may prove useful in other contexts as well.
Collapse
Affiliation(s)
- Veronica J. Vieland
- Battelle Center for Mathematical Medicine, Abigail Wexner Research Institute, Nationwide Children’s Hospital, Columbus, OH, United States of America
- Department of Pediatrics, The Ohio State University, Columbus, OH, United States of America
- Department of Statistics, The Ohio State University, Columbus, OH, United States of America
| | - Sang-Cheol Seok
- Battelle Center for Mathematical Medicine, Abigail Wexner Research Institute, Nationwide Children’s Hospital, Columbus, OH, United States of America
| | - William C. L. Stewart
- Battelle Center for Mathematical Medicine, Abigail Wexner Research Institute, Nationwide Children’s Hospital, Columbus, OH, United States of America
- Department of Pediatrics, The Ohio State University, Columbus, OH, United States of America
- Department of Statistics, The Ohio State University, Columbus, OH, United States of America
| |
Collapse
|
3
|
Bartlett CW, Hou L, Flax JF, Hare A, Cheong SY, Fermano Z, Zimmerman-Bier B, Cartwright C, Azaro MA, Buyske S, Brzustowicz LM. A genome scan for loci shared by autism spectrum disorder and language impairment. Am J Psychiatry 2014; 171:72-81. [PMID: 24170272 PMCID: PMC4431698 DOI: 10.1176/appi.ajp.2013.12081103] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
OBJECTIVE The authors conducted a genetic linkage study of families that have both autism spectrum disorder (ASD) and language-impaired probands to find common communication impairment loci. The hypothesis was that these families have a high genetic loading for impairments in language ability, thus influencing the language and communication deficits of the family members with ASD. Comprehensive behavioral phenotyping of the families also enabled linkage analysis of quantitative measures, including normal, subclinical, and disordered variation in all family members for the three general autism symptom domains: social, communication, and compulsive behaviors. METHOD The primary linkage analysis coded persons with either ASD or specific language impairment as "affected." The secondary linkage analysis consisted of quantitative metrics of autism-associated behaviors capturing normal to clinically severe variation, measured in all family members. RESULTS Linkage to language phenotypes was established at two novel chromosomal loci, 15q23-26 and 16p12. The secondary analysis of normal and disordered quantitative variation in social and compulsive behaviors established linkage to two loci for social behaviors (at 14q and 15q) and one locus for repetitive behaviors (at 13q). CONCLUSION These data indicate shared etiology of ASD and specific language impairment at two novel loci. Additionally, nonlanguage phenotypes based on social aloofness and rigid personality traits showed compelling evidence for linkage in this study group. Further genetic mapping is warranted at these loci.
Collapse
Affiliation(s)
- Christopher W. Bartlett
- The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital and Department of Pediatrics, The Ohio State University, Columbus, OH
| | - Liping Hou
- The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital and Department of Pediatrics, The Ohio State University, Columbus, OH
| | - Judy F. Flax
- Department of Genetics and the Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| | - Abby Hare
- Department of Genetics and the Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| | - Soo Yeon Cheong
- The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital and Department of Pediatrics, The Ohio State University, Columbus, OH
| | - Zena Fermano
- Department of Genetics and the Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| | - Barbie Zimmerman-Bier
- Department of Genetics and the Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ,Department of Pediatrics, Saint Peter's University Hospital, New Brunswick, NJ
| | - Charles Cartwright
- Department of Psychiatry, University of Medicine and Dentistry of New Jersey – New Jersey Medical School, Newark, NJ
| | - Marco A. Azaro
- Department of Genetics and the Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| | - Steven Buyske
- Department of Genetics and the Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ,Department of Statistics and Biostatistics, Rutgers University, Rutgers University, Piscataway, NJ
| | - Linda M. Brzustowicz
- Department of Genetics and the Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ,Corresponding Author:
| |
Collapse
|
4
|
Londono D, Buyske S, Finch SJ, Sharma S, Wise CA, Gordon D. TDT-HET: a new transmission disequilibrium test that incorporates locus heterogeneity into the analysis of family-based association data. BMC Bioinformatics 2012; 13:13. [PMID: 22264315 PMCID: PMC3292499 DOI: 10.1186/1471-2105-13-13] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2011] [Accepted: 01/20/2012] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Locus heterogeneity is one of the most documented phenomena in genetics. To date, relatively little work had been done on the development of methods to address locus heterogeneity in genetic association analysis. Motivated by Zhou and Pan's work, we present a mixture model of linked and unlinked trios and develop a statistical method to estimate the probability that a heterozygous parent transmits the disease allele at a di-allelic locus, and the probability that any trio is in the linked group. The purpose here is the development of a test that extends the classic transmission disequilibrium test (TDT) to one that accounts for locus heterogeneity. RESULTS Our simulations suggest that, for sufficiently large sample size (1000 trios) our method has good power to detect association even the proportion of unlinked trios is high (75%). While the median difference (TDT-HET empirical power - TDT empirical power) is approximately 0 for all MOI, there are parameter settings for which the power difference can be substantial. Our multi-locus simulations suggest that our method has good power to detect association as long as the markers are reasonably well-correlated and the genotype relative risk are larger. Results of both single-locus and multi-locus simulations suggest our method maintains the correct type I error rate.Finally, the TDT-HET statistic shows highly significant p-values for most of the idiopathic scoliosis candidate loci, and for some loci, the estimated proportion of unlinked trios approaches or exceeds 50%, suggesting the presence of locus heterogeneity. CONCLUSIONS We have developed an extension of the TDT statistic (TDT-HET) that allows for locus heterogeneity among coded trios. Benefits of our method include: estimates of parameters in the presence of heterogeneity, and reasonable power even when the proportion of linked trios is small. Also, we have extended multi-locus methods to TDT-HET and have demonstrated that the empirical power may be high to detect linkage. Last, given that we obtain PPBs, we conjecture that the TDT-HET may be a useful method for correctly identifying linked trios. We anticipate that researchers will find this property increasingly useful as they apply next-generation sequencing data in family based studies.
Collapse
Affiliation(s)
- Douglas Londono
- Department of Genetics and Human Genetics Institute, Rutgers, The State University of New Jersey, 145 Bevier Road, Piscataway, NJ, 08854 USA
| | - Steven Buyske
- Department of Genetics and Human Genetics Institute, Rutgers, The State University of New Jersey, 145 Bevier Road, Piscataway, NJ, 08854 USA
- Department of Statistics & Biostatistics, Hill Center, Rutgers, The State University of New Jersey, 110 Frelinghuysen Road Piscataway, NJ 08854-8019 USA
| | - Stephen J Finch
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, 11794-3600 USA
| | - Swarkar Sharma
- Texas Scottish Rite Hospital for Children, 2222 Welborn Street, Dallas, TX 72519 USA
| | - Carol A Wise
- Texas Scottish Rite Hospital for Children, 2222 Welborn Street, Dallas, TX 72519 USA
- Department of Orthopedic Surgery and McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390 USA
| | - Derek Gordon
- Department of Genetics and Human Genetics Institute, Rutgers, The State University of New Jersey, 145 Bevier Road, Piscataway, NJ, 08854 USA
| |
Collapse
|
5
|
Vieland VJ, Huang Y, Seok SC, Burian J, Catalyurek U, O'Connell J, Segre A, Valentine-Cooper W. KELVIN: a software package for rigorous measurement of statistical evidence in human genetics. Hum Hered 2011; 72:276-88. [PMID: 22189470 PMCID: PMC3267994 DOI: 10.1159/000330634] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
This paper describes the software package KELVIN, which supports the PPL (posterior probability of linkage) framework for the measurement of statistical evidence in human (or more generally, diploid) genetic studies. In terms of scope, KELVIN supports two-point (trait-marker or marker-marker) and multipoint linkage analysis, based on either sex-averaged or sex-specific genetic maps, with an option to allow for imprinting; trait-marker linkage disequilibrium (LD), or association analysis, in case-control data, trio data, and/or multiplex family data, with options for joint linkage and trait-marker LD or conditional LD given linkage; dichotomous trait, quantitative trait and quantitative trait threshold models; and certain types of gene-gene interactions and covariate effects. Features and data (pedigree) structures can be freely mixed and matched within analyses. The statistical framework is specifically tailored to accumulate evidence in a mathematically rigorous way across multiple data sets or data subsets while allowing for multiple sources of heterogeneity, and KELVIN itself utilizes sophisticated software engineering to provide a powerful and robust platform for studying the genetics of complex disorders.
Collapse
Affiliation(s)
- Veronica J Vieland
- Battelle Center for Mathematical Medicine, Research Institute at Nationwide Children's Hospital, Ohio State University, 700 Children’s Drive, Columbus, OH 43205, USA.
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Abstract
In this paper, we extend the PPL framework to the analysis of case-control (CC) data and introduce three new linkage disequilibrium (LD) statistics. These statistics measure the evidence for or against LD, rather than testing the null hypothesis of no LD, and they therefore avoid the need for multiple testing corrections. They are suitable not only for CC designs but also can be used in application to family data, ranging from trios to complex pedigrees, all under the same statistical framework, allowing for the seamless analysis of disparate data structures. They also provide other core advantages of the PPL framework, including the use of sequential updating to accumulate LD evidence across potentially heterogeneous sets or subsets of data; parameterization in terms of a very general trait likelihood, which simultaneously considers dominant, recessive, and additive models; and a straightforward mechanism for modeling two-locus epistasis. Finally, by implementing the new statistics within the PPL framework, we have a ready mechanism for incorporating linkage information, obtained from distinct data, into LD analyses in the form of a prior distribution. Here we examine the performance of the proposed LD statistics using simulated data, as well as assessing the effects of key modeling violations on this performance.
Collapse
Affiliation(s)
- Yungui Huang
- The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio 43205, USA.
| | | |
Collapse
|
7
|
Novel method for combined linkage and genome-wide association analysis finds evidence of distinct genetic architecture for two subtypes of autism. J Neurodev Disord 2011; 3:113-23. [PMID: 21484201 PMCID: PMC3105232 DOI: 10.1007/s11689-011-9072-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Accepted: 01/04/2011] [Indexed: 11/26/2022] Open
Abstract
The Autism Genome Project has assembled two large datasets originally designed for linkage analysis and genome-wide association analysis, respectively: 1,069 multiplex families genotyped on the Affymetrix 10 K platform, and 1,129 autism trios genotyped on the Illumina 1 M platform. We set out to exploit this unique pair of resources by analyzing the combined data with a novel statistical method, based on the PPL statistical framework, simultaneously searching for linkage and association to loci involved in autism spectrum disorders (ASD). Our analysis also allowed for potential differences in genetic architecture for ASD in the presence or absence of lower IQ, an important clinical indicator of ASD subtypes. We found strong evidence of multiple linked loci; however, association evidence implicating specific genes was low even under the linkage peaks. Distinct loci were found in the lower IQ families, and these families showed stronger and more numerous linkage peaks, while the normal IQ group yielded the strongest association evidence. It appears that presence/absence of lower IQ (LIQ) demarcates more genetically homogeneous subgroups of ASD patients, with not just different sets of loci acting in the two groups, but possibly distinct genetic architecture between them, such that the LIQ group involves more major gene effects (amenable to linkage mapping), while the normal IQ group potentially involves more common alleles with lower penetrances. The possibility of distinct genetic architecture across subtypes of ASD has implications for further research and perhaps for research approaches to other complex disorders as well.
Collapse
|
8
|
Simmons TR, Flax JF, Azaro MA, Hayter JE, Justice LM, Petrill SA, Bassett AS, Tallal P, Brzustowicz LM, Bartlett CW. Increasing genotype-phenotype model determinism: application to bivariate reading/language traits and epistatic interactions in language-impaired families. Hum Hered 2010; 70:232-44. [PMID: 20948219 PMCID: PMC3085518 DOI: 10.1159/000320367] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2010] [Accepted: 08/13/2010] [Indexed: 11/19/2022] Open
Abstract
While advances in network and pathway analysis have flourished in the era of genome-wide association analysis, understanding the genetic mechanism of individual loci on phenotypes is still readily accomplished using genetic modeling approaches. Here, we demonstrate two novel genotype-phenotype models implemented in a flexible genetic modeling platform. The examples come from analysis of families with specific language impairment (SLI), a failure to develop normal language without explanatory factors such as low IQ or inadequate environment. In previous genome-wide studies, we observed strong evidence for linkage to 13q21 with a reading phenotype in language-impaired families. First, we elucidate the genetic architecture of reading impairment and quantitative language variation in our samples using a bivariate analysis of reading impairment in affected individuals jointly with language quantitative phenotypes in unaffected individuals. This analysis largely recapitulates the baseline analysis using the categorical trait data (posterior probability of linkage (PPL) = 80%), indicating that our reading impairment phenotype captured poor readers who also have low language ability. Second, we performed epistasis analysis using a functional coding variant in the brain-derived neurotrophic factor (BDNF) gene previously associated with reduced performance on working memory tasks. Modeling epistasis doubled the evidence on 13q21 and raised the PPL to 99.9%, indicating that BDNF and 13q21 susceptibility alleles are jointly part of the genetic architecture of SLI. These analyses provide possible mechanistic insights for further cognitive neuroscience studies based on the models developed herein.
Collapse
Affiliation(s)
- Tabatha R Simmons
- Battelle Center for Mathematical Medicine, Research Institute at Nationwide Children's Hospital and Department of Pediatrics, Ohio State University, Columbus, OH 43205, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Combined linkage and linkage disequilibrium analysis of a motor speech phenotype within families ascertained for autism risk loci. J Neurodev Disord 2010; 2:210-223. [PMID: 21125004 PMCID: PMC2974936 DOI: 10.1007/s11689-010-9063-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Accepted: 09/10/2010] [Indexed: 01/08/2023] Open
Abstract
Using behavioral and genetic information from the Autism Genetics Resource Exchange (AGRE) data set we developed phenotypes and investigated linkage and association for individuals with and without Autism Spectrum Disorders (ASD) who exhibit expressive language behaviors consistent with a motor speech disorder. Speech and language variables from Autism Diagnostic Interview-Revised (ADI-R) were used to develop a motor speech phenotype associated with non-verbal or unintelligible verbal behaviors (NVMSD:ALL) and a related phenotype restricted to individuals without significant comprehension difficulties (NVMSD:C). Using Affymetrix 5.0 data, the PPL framework was employed to assess the strength of evidence for or against trait-marker linkage and linkage disequilibrium (LD) across the genome. Ingenuity Pathway Analysis (IPA) was then utilized to identify potential genes for further investigation. We identified several linkage peaks based on two related language-speech phenotypes consistent with a potential motor speech disorder: chromosomes 1q24.2, 3q25.31, 4q22.3, 5p12, 5q33.1, 17p12, 17q11.2, and 17q22 for NVMSD:ALL and 4p15.2 and 21q22.2 for NVMSD:C. While no compelling evidence of association was obtained under those peaks, we identified several potential genes of interest using IPA. CONCLUSION: Several linkage peaks were identified based on two motor speech phenotypes. In the absence of evidence of association under these peaks, we suggest genes for further investigation based on their biological functions. Given that autism spectrum disorders are complex with a wide range of behaviors and a large number of underlying genes, these speech phenotypes may belong to a group of several that should be considered when developing narrow, well-defined, phenotypes in the attempt to reduce genetic heterogeneity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11689-010-9063-2) contains supplementary material, which is available to authorized users.
Collapse
|
10
|
A pure likelihood approach to the analysis of genetic association data: an alternative to Bayesian and frequentist analysis. Eur J Hum Genet 2010; 18:933-41. [PMID: 20424645 DOI: 10.1038/ejhg.2010.47] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Investigators performing genetic association studies grapple with how to measure strength of association evidence, choose sample size, and adjust for multiple testing. We apply the evidential paradigm (EP) to genetic association studies, highlighting its strengths. The EP uses likelihood ratios (LRs), as opposed to P-values or Bayes' factors, to measure strength of association evidence. We derive EP methodology to estimate sample size, adjust for multiple testing, and provide informative graphics for drawing inferences, as illustrated with a Rolandic Epilepsy (RE) fine-mapping study. We focus on controlling the probability of observing weak evidence for or against association (W) rather than type I errors (M). For example, for LR> or =32 representing strong evidence, at one locus with n=200 cases, n=200 controls, W=0.134, whereas M=0.005. For n=300 cases and controls, W=0.039 and M=0.004. These calculations are based on detecting an OR=1.5. Despite the common misconception, one is not tied to this planning value for analysis; rather one calculates the likelihood at all possible values to assess evidence for association. We provide methodology to adjust for multiple tests across m loci, which adjusts M and W for m. We do so for (a) single-stage designs, (b) two-stage designs, and (c) simultaneously controlling family-wise error rate (FWER) and W. Method (c) chooses larger sample sizes than (a) or (b), whereas (b) has smaller bounds on the FWER than (a). The EP, using our innovative graphical display, identifies important SNPs in elongator protein complex 4 (ELP4) associated with RE that may not have been identified using standard approaches.
Collapse
|
11
|
Poduri A, Wang Y, Gordon D, Barral-Rodriguez S, Barker-Cummings C, Ulgen A, Chitsazzadeh V, Hill RS, Risch N, Hauser WA, Pedley TA, Walsh CA, Ottman R. Novel susceptibility locus at chromosome 6q16.3-22.31 in a family with GEFS+. Neurology 2009; 73:1264-72. [PMID: 19841378 DOI: 10.1212/wnl.0b013e3181bd10d3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Genetic epilepsy with febrile seizures plus (GEFS+) is a familial epilepsy syndrome with extremely variable expressivity. Mutations in 5 genes that raise susceptibility to GEFS+ have been discovered, but they account for only a small proportion of families. METHODS We identified a 4-generation family containing 15 affected individuals with a range of phenotypes in the GEFS+ spectrum, including febrile seizures, febrile seizures plus, epilepsy, and severe epilepsy with developmental delay. We performed a genome-wide linkage analysis using microsatellite markers and then saturated the potential linkage region identified by this screen with more markers. We evaluated the evidence for linkage using both model-based and model-free (posterior probability of linkage [PPL]) analyses. We sequenced 16 candidate genes and screened for copy number abnormalities in the minimal genetic region. RESULTS All 15 affected subjects and 1 obligate carrier shared a haplotype of markers at chromosome 6q16.3-22.31, an 18.1-megabase region flanked by markers D6S962 and D6S287. The maximum multipoint lod score in this region was 4.68. PPL analysis indicated an 89% probability of linkage. Sequencing of 16 candidate genes did not reveal a causative mutation. No deletions or duplications were identified. CONCLUSIONS We report a novel susceptibility locus for genetic epilepsy with febrile seizures plus at 6q16.3-22.31, in which there are no known genes associated with ion channels or neurotransmitter receptors. The identification of the responsible gene in this region is likely to lead to the discovery of novel mechanisms of febrile seizures and epilepsy.
Collapse
Affiliation(s)
- A Poduri
- Division of Epilepsy and Clinical Neurophysiology, Department of Neurology, Children's Hospital Boston, MA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Seok SC, Evans M, Vieland VJ. Fast and accurate calculation of a computationally intensive statistic for mapping disease genes. J Comput Biol 2009; 16:659-76. [PMID: 19432537 DOI: 10.1089/cmb.2008.0175] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Many statistical methods in biology utilize numerical integration in order to deal with moderately high-dimensional parameter spaces without closed form integrals. One such method is the PPL, a class of models for mapping and modeling genes for complex human disorders. While the most common approach to numerical integration in statistics is MCMC, this is not a good option for the PPL for a variety of reasons, leading us to develop an alternative integration method for this application. We utilize an established sub-region adaptive integration method, but adapt it to specific features of our application. These include division of the multi-dimensional integrals into three separate layers, implementing internal constraints on the parameter space, and calibrating the approximation to ensure adequate precision of results for our application. The proposed approach is compared with an empirically driven fixed grid scheme as well as other numerical integration methods. The new method is shown to require far fewer function evaluations compared to the alternatives while matching or exceeding the best of them in terms of accuracy. The savings in evaluations is sufficiently large that previously intractable problems are now feasible in real time.
Collapse
Affiliation(s)
- Sang-Cheol Seok
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio 43205, USA.
| | | | | |
Collapse
|
13
|
Wratten NS, Memoli H, Huang Y, Dulencin AM, Matteson PG, Cornacchia MA, Azaro MA, Messenger J, Hayter JE, Bassett AS, Buyske S, Millonig JH, Vieland VJ, Brzustowicz LM. Identification of a schizophrenia-associated functional noncoding variant in NOS1AP. Am J Psychiatry 2009; 166:434-41. [PMID: 19255043 PMCID: PMC3295829 DOI: 10.1176/appi.ajp.2008.08081266] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
OBJECTIVE The authors previously demonstrated significant association between markers within NOS1AP and schizophrenia in a set of Canadian families of European descent, as well as significantly increased expression in schizophrenia of NOS1AP in unrelated postmortem samples from the dorsolateral prefrontal cortex. In this study the authors sought to apply novel statistical methods and conduct additional biological experiments to isolate at least one risk allele within NOS1AP. METHOD Using the posterior probability of linkage disequilibrium (PPLD) to measure the probability that a single nucleotide polymorphism (SNP) is in linkage disequilibrium with schizophrenia, the authors evaluated 60 SNPs from NOS1AP in 24 Canadian families demonstrating linkage and association to this region. SNPs exhibiting strong evidence of linkage disequilibrium were tested for regulatory function by luciferase reporter assay. Two human neural cell lines (SK-N-MC and PFSK-1) were transfected with a vector containing each allelic variant of the SNP, the NOS1AP promoter, and a luciferase gene. Alleles altering expression were further assessed for binding of nuclear proteins by electrophoretic mobility shift assay. RESULTS Three SNPs produced PPLDs >40%. One of them, rs12742393, demonstrated significant allelic expression differences in both cell lines tested. The allelic variation at this SNP altered the affinity of nuclear protein binding to this region of DNA. CONCLUSIONS The A allele of rs12742393 appears to be a risk allele associated with schizophrenia that acts by enhancing transcription factor binding and increasing gene expression.
Collapse
Affiliation(s)
- Naomi S Wratten
- Rutgers University Department of Genetics, 145 Bevier Road, Piscataway, NJ 08854, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Govil M, Vieland VJ. Practical considerations for dividing data into subsets prior to PPL analysis. Hum Hered 2008; 66:223-37. [PMID: 18612207 DOI: 10.1159/000143405] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2007] [Accepted: 11/16/2007] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE The PPL, a class of statistics for complex trait genetic mapping in humans, utilizes Bayesian sequential updating to accumulate evidence for or against linkage across potentially heterogeneous data (sub)sets. Here, we systematically explore the relative efficacy of alternative subsetting approaches for purposes of PPL calculation. METHODS We simulated genotypes for three pedigree sets (sib pairs; 2-3 generations; >or=4 generations) based on families from an ongoing study. For each pedigree set, 100 replicates were generated under different levels of heterogeneity (1000 under 'no linkage'). Within each replicate, updating was performed across subsets defined randomly (RAND2, RAND4), by true (TRUE) linkage status, with a realistic (REAL) classification, by individual pedigree (PED), or without any subsetting (NONE). RESULTS Under 'linkage', REAL yields larger PPLs compared to NONE, RAND2, RAND4, or PED. Under 'no linkage', RAND2, RAND4 and PED yield PPLs close to NONE. CONCLUSIONS We have examined the impact of different subsetting strategies on the sampling behavior of the PPL. Our results underscore the utility of finding variables that can help delineate more homogeneous data subsets and demonstrate that, once such variables are found, sequential updating can be highly beneficial in the presence of appreciable heterogeneity at a linked locus, without inflation at an unlinked locus.
Collapse
Affiliation(s)
- M Govil
- Department of Oral Biology and Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
| | | |
Collapse
|
15
|
Abstract
When two genes interact to cause a clinically important phenotype, it would seem reasonable to expect that we could leverage genotypic information at one of the loci in order to improve our ability to detect the other. We were therefore interested in extending the posterior probability of linkage (PPL), a class of linkage statistics we have been developing over the past decade, in order to explicitly allow for gene × gene interaction. In this report we utilize a new implementation of the PPL incorporating liability classes (LCs), which provide a direct parameterization of gene × gene interaction by allowing the penetrances at the locus being evaluated to depend upon measured genotypes at a known locus. With knowledge of the generating model for the simulated rheumatoid arthritis (RA) data, we selected two loci for examination: Locus A, which in interaction with the HLA-DR antigen locus affects risk of the dichotomous RA phenotype; and Locus E, which in interaction with DR affects quantitative levels of the anti-CCP phenotype. The data comprised nuclear families of two parents and an affected sib pair (ASP). Our results confirm theoretical work suggesting that gene × gene interactions CANNOT be leveraged to improve linkage detection for dichotomous traits based on affecteds-only data structures. However, incorporation of DR-based LCs did lead to appreciably higher quantitative trait PPLs. This suggests that gene × gene interactions could be effectively used in quantitative trait analyses even when families have been ascertained as ASPs for a related dichotomous trait.
Collapse
|
16
|
Albers CA, Kappen HJ. Modeling linkage disequilibrium in exact linkage computations: a comparison of first-order Markov approaches and the clustered-markers approach. BMC Proc 2007; 1 Suppl 1:S159. [PMID: 18466504 PMCID: PMC2367570 DOI: 10.1186/1753-6561-1-s1-s159] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Recent studies have shown that linkage disequilibrium (LD) between single-nucleotide polymorphism (SNP) markers is widespread. Assuming linkage equilibrium has been shown to cause false positives in linkage studies where parental genotypes are not available. Therefore, linkage analysis methods that can deal with LD are required to accurately analyze SNP marker data sets. We compared three approaches to deal with LD between markers: 1) The clustered-markers approach implemented in the computer program MERLIN; 2) The standard hidden Markov model (HMM) multipoint model augmented with a first-order Markov model for the allele frequencies of the founders, in which we considered both a Bayesian and a maximum-likelihood implementation of this approach; 3) The 'independent' SNPs approach, i.e., removing SNPs from the data set until the remaining SNPs have low levels of LD. We evaluated these approaches on the Illumina 6K SNP data set of affected sib-pairs of Problem 2. We found that the first-order Markov model was able to account for most of the strong LD in this data set. The difference between the Bayesian and maximum- likelihood implementation was small. An advantage of the first-order Markov model is that it does not require the user to specify parameters.
Collapse
Affiliation(s)
- Cornelis A Albers
- Department of Biophysics, Radboud University, 126 Geert Grooteplein 21, Nijmegen, Gelderland 6525EZ The Netherlands.
| | | |
Collapse
|
17
|
Bartlett CW, Vieland VJ. Accumulating quantitative trait linkage evidence across multiple datasets using the posterior probability of linkage. Genet Epidemiol 2007; 31:91-102. [PMID: 17123305 DOI: 10.1002/gepi.20193] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Genome scans for complex disorders are frequently inconclusive, prompting researchers to increase sample size in an effort to obtain stronger evidence. However, increasing sample size in the presence of locus heterogeneity may actually, on average, decrease the linkage signal at a true susceptibility gene. The posterior probability of linkage, or PPL, was specifically designed to address this issue in the context of categorical trait analysis, by appropriately accumulating evidence either for or against linkage as new data are added. We now formulate a quantitative trait (QT) analog, the QT-PPL, which directly measures the evidence that a QT is linked to a genetic marker or location. The new QT-PPL is based on a classical single-locus QT likelihood with the trait parameters (allele frequency, genotypic means and variances) integrated out. We show using simulations that the QT-PPL is robust to two key modeling violations (multiple trait loci and non-normality in the form of excess kurtosis), as well as being inherently ascertainment corrected, and illustrate the advantages of the QT-PPL for accumulating linkage evidence across multiple sets of data compared to other QT linkage methods.
Collapse
Affiliation(s)
- Christopher W Bartlett
- Center for Statistical Genetics Research, College of Public Health and Roy J and Lucille A Carver College of Medicine, University of Iowa, Iowa City, IA, USA.
| | | |
Collapse
|
18
|
Vieland VJ. Thermometers: something for statistical geneticists to think about. Hum Hered 2006; 61:144-56. [PMID: 16770079 DOI: 10.1159/000093775] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2006] [Accepted: 04/12/2006] [Indexed: 11/19/2022] Open
Abstract
In human genetics, we measure the strength of statistical evidence using a variety of maximized likelihood ratios, LODs, and empirical p values. I argue here that these statistics have highly undesirable properties as evidence measures when applied to complex disorders. Among other deficiencies, I show that when following up on an interesting finding, they will tend to erroneously indicate diminished evidence as more data are considered (e.g., the LOD will tend to go down at a linked locus as the sample size increases). This violates a fundamental assumption underlying standard linkage and association designs in which we first scan the genome for our best signals, and then follow up at those genomic positions with additional data. I argue here for a coherent theoretical approach to formalizing statistical evidence measures, and derive a set of minimal requirements that any evidence measure should meet, drawing heavily on an analogy with the thermometer. I speculate that measures of evidence that come closer to meeting these requirements will do a better job of finding and characterizing genes, and I propose an alternative evidence metric as a step in this direction.
Collapse
Affiliation(s)
- Veronica J Vieland
- College of Public Health and Carver College of Medicine, 2190 Westlawn Building, University of Iowa, Iowa City, IA 52242, USA.
| |
Collapse
|
19
|
Logue MW, Vieland VJ. The incorporation of prior genomic information does not necessarily improve the performance of Bayesian linkage methods: an example involving sex-specific recombination and the two-point PPL. Hum Hered 2006; 60:196-205. [PMID: 16397399 DOI: 10.1159/000090543] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2005] [Accepted: 10/12/2005] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE We continue statistical development of the posterior probability of linkage (PPL). We present a two-point PPL allowing for unequal male and female recombination fractions, thetaM and thetaF, and consider alternative priors on thetaM, thetaF. METHODS We compare the sex-averaged PPL (PPLSA), assuming thetaM = thetaF, to the sex-specific PPL (PPLSS) in (thetaM, thetaF), in a series of simulations; we also compute the PPLSS using alternative priors on (thetaM, thetaF). RESULTS The PPLSS based on a prior that ignores prior genomic information on sex specific recombination rates performs essentially identically to the PPLSA, even in the presence of large thetaM, thetaF differences. Moreover, adaptively skewing the prior, to incorporate (correct) genomic information on thetaM, thetaF differences, actually worsens performance of the PPLSS. We demonstrate that this has little to do with the PPLSS per se, but is rather due to extremely high levels of variability in the location of the maximum likelihood estimates of (thetaM, thetaF) in realistic data sets. CONCLUSIONS Incorporating (correct) prior genomic information is not always helpful. We recommend that the PPLSA be used as the standard form of the PPL regardless of the sex-specific recombination rates in the region of the marker in question.
Collapse
Affiliation(s)
- Mark W Logue
- Program in Public Health Genetics, College of Public Health, University of Iowa, Iowa City, Iowa 52242, USA.
| | | |
Collapse
|
20
|
Abstract
The past 25 years has seen an explosion in the number of genetic markers that can be measured on DNA samples at an ever decreasing cost. Although basic statistical methods for analysing such data gathered on samples of either independent individuals or family members, one or two markers at a time, were already well developed before this explosion occurred, there has been a corresponding burst in activity to develop multiple marker models to find disease-causing gene variants, capitalizing on the data that have become available, to increase the power of such methods. This has required the concomitant development of faster algorithms to speed up the computation of various likelihoods. For linkage analysis, to obtain the approximate locations for genes of interest, Mendelian segregation models have been extended to be more realistic and statistical models that do not assume specific modes of inheritance have been extended to allow for the analysis of larger pedigree structures. For association analysis, to obtain more precise locations for genes of interest, the recent completion of the first stage of the HapMap project has spurred the development, still underway, of novel experimental designs and analytical methods to combat the curse of dimensionality and the resulting multiple testing problem. Perhaps the greatest current challenge concerns how best to gather and synthesize the many lines of evidence possible in order to discover the genetic determinants underlying complex diseases.
Collapse
Affiliation(s)
- Robert C Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.
| | | |
Collapse
|