1
|
Huang Z, Kelleher J, Chan YB, Balding DJ. Estimating evolutionary and demographic parameters via ARG-derived IBD. bioRxiv 2024:2024.03.07.583855. [PMID: 38559261 PMCID: PMC10979897 DOI: 10.1101/2024.03.07.583855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Inference of demographic and evolutionary parameters from a sample of genome sequences often proceeds by first inferring identical-by-descent (IBD) genome segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose a length threshold on IBD segments, (ii) IBD can be defined without the hard-to-verify requirement of no recombination, and (iii) computation time can be reduced with little loss of statistical efficiency using only the IBD segments from a set of sequence pairs that scales linearly with sample size. We first demonstrate powerful inferences when true IBD information is available from simulated data. For IBD inferred from real data, we propose an approximate Bayesian computation inference algorithm and use it to show that poorly-inferred short IBD segments can improve estimation precision. We show estimation precision similar to a previously-published estimator despite a 4 000-fold reduction in data used for inference. Computational cost limits model complexity in our approach, but we are able to incorporate unknown nuisance parameters and model misspecification, still finding improved parameter inference.
Collapse
Affiliation(s)
- Zhendong Huang
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - Jerome Kelleher
- Oxford Big Data Institute, University of Oxford, United Kingdom
| | - Yao-ban Chan
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - David J. Balding
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| |
Collapse
|
2
|
Grinton BE, Robertson E, Fearnley LG, Scheffer IE, Marson AG, O'Brien TJ, Pickrell WO, Rees MI, Sisodiya SM, Balding DJ, Bennett MF, Bahlo M, Berkovic SF, Oliver KL. A founder event causing a dominant childhood epilepsy survives 800 years through weak selective pressure. Am J Hum Genet 2022; 109:2080-2087. [PMID: 36288729 PMCID: PMC9674963 DOI: 10.1016/j.ajhg.2022.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 10/03/2022] [Indexed: 01/26/2023] Open
Abstract
Genetic epilepsy with febrile seizures plus (GEFS+) is an autosomal dominant familial epilepsy syndrome characterized by distinctive phenotypic heterogeneity within families. The SCN1B c.363C>G (p.Cys121Trp) variant has been identified in independent, multi-generational families with GEFS+. Although the variant is present in population databases (at very low frequency), there is strong clinical, genetic, and functional evidence to support pathogenicity. Recurrent variants may be due to a founder event in which the variant has been inherited from a common ancestor. Here, we report evidence of a single founder event giving rise to the SCN1B c.363C>G variant in 14 independent families with epilepsy. A common haplotype was observed in all families, and the age of the most recent common ancestor was estimated to be approximately 800 years ago. Analysis of UK Biobank whole-exome-sequencing data identified 74 individuals with the same variant. All individuals carried haplotypes matching the epilepsy-affected families, suggesting all instances of the variant derive from a single mutational event. This unusual finding of a variant causing an autosomal dominant, early-onset disease in an outbred population that has persisted over many generations can be attributed to the relatively mild phenotype in most carriers and incomplete penetrance. Founder events are well established in autosomal recessive and late-onset disorders but are rarely observed in early-onset, autosomal dominant diseases. These findings suggest variants present in the population at low frequencies should be considered potentially pathogenic in mild phenotypes with incomplete penetrance and may be more important contributors to the genetic landscape than previously thought.
Collapse
Affiliation(s)
- Bronwyn E Grinton
- Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, Heidelberg, VIC 3084, Australia
| | - Erandee Robertson
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Department of Medical Biology, University of Melbourne, Parkville, VIC 3010, Australia
| | - Liam G Fearnley
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Department of Medical Biology, University of Melbourne, Parkville, VIC 3010, Australia
| | - Ingrid E Scheffer
- Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, Heidelberg, VIC 3084, Australia; Department of Paediatrics, The University of Melbourne, Royal Children's Hospital, Parkville, VIC 3052, Australia; Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC 3052, Australia; Florey Institute of Neuroscience and Mental Health, Heidelberg, VIC 3084, Australia
| | - Anthony G Marson
- Department of Molecular and Clinical Pharmacology, University of Liverpool, Liverpool L69 3BX, UK
| | - Terence J O'Brien
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3004, Australia; Department of Neurology, The Royal Melbourne Hospital, Parkville, VIC 3052, Australia; Department of Neurology, Alfred Health, Melbourne, VIC 3004, Australia; Department of Medicine, The University of Melbourne, Parkville, VIC 3010, Australia
| | - W Owen Pickrell
- Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK; Department of Neurology, Morriston Hospital, Swansea Bay University Health Board, Swansea SA2 8PP, UK
| | - Mark I Rees
- Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK; Faculty of Medicine & Health, University of Sydney, Camperdown, NSW 2006, Australia
| | - Sanjay M Sisodiya
- Chalfont Centre for Epilepsy, Chalfont St Peter, Buckinghamshire HP11 2FZ, UK; Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - David J Balding
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics & Statistics, University of Melbourne, Parkville, VIC 3010, Australia
| | - Mark F Bennett
- Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, Heidelberg, VIC 3084, Australia; Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Department of Medical Biology, University of Melbourne, Parkville, VIC 3010, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Department of Medical Biology, University of Melbourne, Parkville, VIC 3010, Australia
| | - Samuel F Berkovic
- Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, Heidelberg, VIC 3084, Australia.
| | - Karen L Oliver
- Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, Heidelberg, VIC 3084, Australia; Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Department of Medical Biology, University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
3
|
Caliebe A, Tekola‐Ayele F, Darst BF, Wang X, Song YE, Gui J, Sebro RA, Balding DJ, Saad M, Dubé M. Including diverse and admixed populations in genetic epidemiology research. Genet Epidemiol 2022; 46:347-371. [PMID: 35842778 PMCID: PMC9452464 DOI: 10.1002/gepi.22492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/31/2022] [Accepted: 06/06/2022] [Indexed: 11/25/2022]
Abstract
The inclusion of ancestrally diverse participants in genetic studies can lead to new discoveries and is important to ensure equitable health care benefit from research advances. Here, members of the Ethical, Legal, Social, Implications (ELSI) committee of the International Genetic Epidemiology Society (IGES) offer perspectives on methods and analysis tools for the conduct of inclusive genetic epidemiology research, with a focus on admixed and ancestrally diverse populations in support of reproducible research practices. We emphasize the importance of distinguishing socially defined population categorizations from genetic ancestry in the design, analysis, reporting, and interpretation of genetic epidemiology research findings. Finally, we discuss the current state of genomic resources used in genetic association studies, functional interpretation, and clinical and public health translation of genomic findings with respect to diverse populations.
Collapse
Affiliation(s)
- Amke Caliebe
- Institute of Medical Informatics and StatisticsKiel University and University Hospital Schleswig‐HolsteinKielGermany
| | - Fasil Tekola‐Ayele
- Epidemiology Branch, Division of Population Health Research, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human DevelopmentNational Institutes of HealthBethesdaMarylandUSA
| | - Burcu F. Darst
- Center for Genetic EpidemiologyUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
- Public Health Sciences DivisionFred Hutchinson Cancer Research CenterSeattleWashingtonUSA
| | - Xuexia Wang
- Department of MathematicsUniversity of North TexasDentonTexasUSA
| | - Yeunjoo E. Song
- Department of Population and Quantitative Health SciencesCase Western Reserve UniversityClevelandOhioUSA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth CollegeOne Medical Center Dr.LebanonNew HampshireUSA
| | | | - David J. Balding
- Melbourne Integrative Genomics, Schools of BioSciences and of Mathematics & StatisticsUniversity of MelbourneMelbourneAustralia
| | - Mohamad Saad
- Qatar Computing Research InstituteHamad Bin Khalifa UniversityDohaQatar
- Neuroscience Research Center, Faculty of Medical SciencesLebanese UniversityBeirutLebanon
| | - Marie‐Pierre Dubé
- Department of Medicine, and Social and Preventive MedicineUniversité de MontréalMontréalQuébecCanada
- Beaulieu‐Saucier Pharmacogenomcis CentreMontreal Heart InstituteMontrealCanada
| | | |
Collapse
|
4
|
Speed D, Kaphle A, Balding DJ. SNP-based heritability and selection analyses: Improved models and new results. Bioessays 2022; 44:e2100170. [PMID: 35279859 DOI: 10.1002/bies.202100170] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 03/02/2022] [Accepted: 03/03/2022] [Indexed: 01/15/2023]
Abstract
Complex-trait genetics has advanced dramatically through methods to estimate the heritability tagged by SNPs, both genome-wide and in genomic regions of interest such as those defined by functional annotations. The models underlying many of these analyses are inadequate, and consequently many SNP-heritability results published to date are inaccurate. Here, we review the modelling issues, both for analyses based on individual genotype data and association test statistics, highlighting the role of a low-dimensional model for the heritability of each SNP. We use state-of-art models to present updated results about how heritability is distributed with respect to functional annotations in the human genome, and how it varies with allele frequency, which can reflect purifying selection. Our results give finer detail to the picture that has emerged in recent years of complex trait heritability widely dispersed across the genome. Confounding due to population structure remains a problem that summary statistic analyses cannot reliably overcome. Also see the video abstract here: https://youtu.be/WC2u03V65MQ.
Collapse
Affiliation(s)
- Doug Speed
- Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.,Aarhus Institute of Advanced Studies, Aarhus University, Aarhus, Denmark.,UCL Genetics Institute, University College London, London, UK
| | - Anubhav Kaphle
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Victoria, Australia
| | - David J Balding
- UCL Genetics Institute, University College London, London, UK.,Melbourne Integrative Genomics, School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Victoria, Australia
| |
Collapse
|
5
|
Andersen MM, Balding DJ. Assessing the Forensic Value of DNA Evidence from Y Chromosomes and Mitogenomes. Genes (Basel) 2021; 12:genes12081209. [PMID: 34440383 PMCID: PMC8391915 DOI: 10.3390/genes12081209] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 07/30/2021] [Accepted: 08/02/2021] [Indexed: 11/17/2022] Open
Abstract
Y chromosome and mitochondrial DNA profiles have been used as evidence in courts for decades, yet the problem of evaluating the weight of evidence has not been adequately resolved. Both are lineage markers (inherited from just one parent), which presents different interpretation challenges compared with standard autosomal DNA profiles (inherited from both parents). We review approaches to the evaluation of lineage marker profiles for forensic identification, focussing on the key roles of profile mutation rate and relatedness (extending beyond known relatives). Higher mutation rates imply fewer individuals matching the profile of an alleged contributor, but they will be more closely related. This makes it challenging to evaluate the possibility that one of these matching individuals could be the true source, because relatives may be plausible alternative contributors, and may not be well mixed in the population. These issues reduce the usefulness of profile databases drawn from a broad population: larger populations can have a lower profile relative frequency because of lower relatedness with the alleged contributor. Many evaluation methods do not adequately take account of distant relatedness, but its effects have become more pronounced with the latest generation of high-mutation-rate Y profiles.
Collapse
Affiliation(s)
- Mikkel M. Andersen
- Department of Mathematical Sciences, Aalborg University, 9220 Aalborg, Denmark
- Section of Forensic Genetics, Department of Forensic Medicine, University of Copenhagen, 1165 Copenhagen, Denmark
- Correspondence:
| | - David J. Balding
- Melbourne Integrative Genomics, University of Melbourne, Melbourne 3010, Australia;
- Genetics Institute, University College London, London WC1E 6BT, UK
| |
Collapse
|
6
|
Paril JF, Balding DJ, Fournier-Level A. Optimizing sampling design and sequencing strategy for the genomic analysis of quantitative traits in natural populations. Mol Ecol Resour 2021; 22:137-152. [PMID: 34192415 DOI: 10.1111/1755-0998.13458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 05/02/2021] [Accepted: 06/25/2021] [Indexed: 11/27/2022]
Abstract
Mapping the genes underlying ecologically relevant traits in natural populations is fundamental to develop a molecular understanding of species adaptation. Current sequencing technologies enable the characterization of a species' genetic diversity across the landscape or even over its whole range. The relevant capture of the genetic diversity across the landscape is critical for a successful genetic mapping of traits and there are no clear guidelines on how to achieve an optimal sampling and which sequencing strategy to implement. Here we determine, through simulation, the sampling scheme that maximizes the power to map the genetic basis of a complex trait in an outbreeding species across an idealized landscape and draw genomic predictions for the trait, comparing individual and pool sequencing strategies. Our results show that quantitative trait locus detection power and prediction accuracy are higher when more populations over the landscape are sampled and this is more cost-effectively done with pool sequencing than with individual sequencing. Additionally, we recommend sampling populations from areas of high genetic diversity. As progress in sequencing enables the integration of trait-based functional ecology into landscape genomics studies, these findings will guide study designs allowing direct measures of genetic effects in natural populations across the environment.
Collapse
Affiliation(s)
- Jefferson F Paril
- School of Biosciences, The University of Melbourne, Parkville, Victoria, Australia
| | - David J Balding
- School of Biosciences, The University of Melbourne, Parkville, Victoria, Australia.,Melbourne Integrative Genomics, The University of Melbourne, Parkville, Victoria, Australia.,School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Alexandre Fournier-Level
- School of Biosciences, The University of Melbourne, Parkville, Victoria, Australia.,Melbourne Integrative Genomics, The University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
7
|
Affiliation(s)
- David J. Balding
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics & Statistics, University of Melbourne, Parkville, Victoria, Australia
| | - Gregory S. Barsh
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Gregory P. Copenhaver
- Department of Biology and the Integrative Program for Biological and Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail:
| | - Chengqi Yi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China
| |
Collapse
|
8
|
Holmes JB, Speed D, Balding DJ. Summary statistic analyses can mistake confounding bias for heritability. Genet Epidemiol 2019; 43:930-940. [PMID: 31541496 DOI: 10.1002/gepi.22259] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 07/28/2019] [Accepted: 08/09/2019] [Indexed: 11/11/2022]
Abstract
Linkage disequilibrium SCore regression (LDSC) has become a popular approach to estimate confounding bias, heritability, and genetic correlation using only genome-wide association study (GWAS) test statistics. SumHer is a newly introduced alternative with similar aims. We show using theory and simulations that both approaches fail to adequately account for confounding bias, even when the assumed heritability model is correct. Consequently, these methods may estimate heritability poorly if there was an inadequate adjustment for confounding in the original GWAS analysis. We also show that the choice of a summary statistic for use in LDSC or SumHer can have a large impact on resulting inferences. Further, covariate adjustments in the original GWAS can alter the target of heritability estimation, which can be problematic for test statistics from a meta-analysis of GWAS with different covariate adjustments.
Collapse
Affiliation(s)
- John B Holmes
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Doug Speed
- Aarhus Institute of Advanced Studies (AIAS), Aarhus University, Aarhus, Denmark.,UCL Genetics Institute, University College London, London, UK
| | - David J Balding
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia.,UCL Genetics Institute, University College London, London, UK
| |
Collapse
|
9
|
Abstract
We present SumHer, software for estimating confounding bias, SNP heritability, enrichments of heritability and genetic correlations using summary statistics from genome-wide association studies. The key difference between SumHer and the existing software LD Score Regression (LDSC) is that SumHer allows the user to specify the heritability model. We apply SumHer to results from 24 large-scale association studies (average sample size 121,000) using our recommended heritability model. We show that these studies tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci was under-reported by about a quarter. We also estimate enrichments for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further six categories with above threefold enrichment. By contrast, our analysis using SumHer finds that none of the categories have enrichment above twofold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.
Collapse
Affiliation(s)
- Doug Speed
- Aarhus Institute of Advanced Studies, Aarhus University, Aarhus, Denmark. .,Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark. .,UCL Genetics Institute, University College London, London, UK.
| | - David J Balding
- UCL Genetics Institute, University College London, London, UK.,Melbourne Integrative Genomics, School of BioSciences and School of Mathematics & Statistics, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
10
|
Abstract
Mitochondrial DNA (mtDNA) is useful to assist with identification of the source of a biological sample, or to confirm matrilineal relatedness. Although the autosomal genome is much larger, mtDNA has an advantage for forensic applications of multiple copy number per cell, allowing better recovery of sequence information from degraded samples. In addition, biological samples such as fingernails, old bones, teeth and hair have mtDNA but little or no autosomal DNA. The relatively low mutation rate of the mitochondrial genome (mitogenome) means that there can be large sets of matrilineal-related individuals sharing a common mitogenome. Here we present the mitolina simulation software that we use to describe the distribution of the number of mitogenomes in a population that match a given mitogenome, and investigate its dependence on population size and growth rate, and on a database count of the mitogenome. Further, we report on the distribution of the number of meioses separating pairs of individuals with matching mitogenome. Our results have important implications for assessing the weight of mtDNA profile evidence in forensic science, but mtDNA analysis has many non-human applications, for example in tracking the source of ivory. Our methods and software can also be used for simulations to help validate models of population history in human or non-human populations.
Collapse
Affiliation(s)
- Mikkel M. Andersen
- Department of Mathematical Sciences, Aalborg University, Aalborg, Denmark
| | - David J. Balding
- Melbourne Integrative Genomics, University of Melbourne, Victoria, Australia
- Genetics Institute, University College London, London, UK
- * E-mail:
| |
Collapse
|
11
|
Hessab T, Aranha RS, Moura-Neto RS, Balding DJ, Schrago CG. Evaluating DNA evidence in a genetically complex population. Forensic Sci Int Genet 2018; 36:141-147. [PMID: 29990826 DOI: 10.1016/j.fsigen.2018.06.019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 06/27/2018] [Accepted: 06/28/2018] [Indexed: 11/28/2022]
Abstract
In forensic genetics, the likelihood ratio (LR), measuring the value of DNA profile evidence, is computed from a database of allele frequencies. Here, we address the choice of database and adjustments for population structure and sample size in the context of Brazil. The Brazilian population underwent a complex process of colonization, migration and mating, which created an admixed genetic composition that makes it difficult to obtain an appropriate database for a given case. National databases are now available, as well as databases for many Brazilian states. However, those databases are not statistically random samples, and state boundaries may not accurately reflect the sub-structuring of genetic diversity. We compared the LR calculated using the relevant state-specific database with the statistics calculated when a national database and when international databases were used. We evaluated two methods of adjustment for population structure, due to Wright [13] and Balding and Nichols [14]. We also considered two adjustments for database sample size: the Balding size bias correction [15] and a minimum allele frequency [16]. Our results show that the use of a national database with the Balding and Nichols adjustment and θ = 0.002 generated lower LR values than did the state-specific database in more than 50% of the profiles simulated using the state-based allele frequencies, while θ = 0.01 produced lower LRs for more than 90% of the profiles. We conclude that the utilization of a national database for Brazilian cases can be justified in association with the appropriate adjustment for population structure.
Collapse
Affiliation(s)
- T Hessab
- Departamento de Genética, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil; Instituto de Pesquisa e Perícias em Genética Forense, DGPTC/PCERJ, Rio de Janeiro, RJ, Brazil.
| | - R S Aranha
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, RJ, Brazil
| | - R S Moura-Neto
- Instituto de Biologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | - D J Balding
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics & Statistics, The University of Melbourne, Melbourne, Australia
| | - C G Schrago
- Departamento de Genética, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| |
Collapse
|
12
|
Abstract
The introduction of forensic autosomal DNA profiles was controversial, but the problems were successfully addressed, and DNA profiling has gone on to revolutionise forensic science. Y-chromosome profiles are valuable when there is a mixture of male-source and female-source DNA, and interest centres on the identity of the male source(s) of the DNA. The problem of evaluating evidential weight is even more challenging for Y profiles than for autosomal profiles. Numerous approaches have been proposed, but they fail to deal adequately with the fact that men with matching Y-profiles are related in extended patrilineal clans, many of which may not be represented in available databases. The higher mutation rates of modern profiling kits have led to increased discriminatory power but they have also exacerbated the problem of fairly conveying evidential value. Because the relevant population is difficult to define, yet the number of matching relatives is fixed as population size varies, it is typically infeasible to derive population-based match probabilities relevant to a specific crime. We propose a conceptually simple solution, based on a simulation model and software to approximate the distribution of the number of males with a matching Y profile. We show that this distribution is robust to different values for the variance in reproductive success and the population growth rate. We also use importance sampling reweighting to derive the distribution of the number of matching males conditional on a database frequency, finding that this conditioning typically has only a modest impact. We illustrate the use of our approach to quantify the value of Y profile evidence for a court in a way that is both scientifically valid and easily comprehensible by a judge or juror.
Collapse
Affiliation(s)
- Mikkel M. Andersen
- Department of Mathematical Sciences, Aalborg University, Aalborg, Denmark
- Section of Forensic Genetics, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark
| | - David J. Balding
- Centre for Systems Genomics, Royal Parade, University of Melbourne, Melbourne, Australia
- Genetics Institute, University College London, London, United Kingdom
- * E-mail:
| |
Collapse
|
13
|
Traynelis J, Silk M, Wang Q, Berkovic SF, Liu L, Ascher DB, Balding DJ, Petrovski S. Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation. Genome Res 2017; 27:1715-1729. [PMID: 28864458 PMCID: PMC5630035 DOI: 10.1101/gr.226589.117] [Citation(s) in RCA: 113] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 08/08/2017] [Indexed: 12/26/2022]
Abstract
Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic architecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal variants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient, to distinguish disease-causing from benign variants. Large samples of human standing variation data highlight regional variation in the tolerance to missense variation within the protein-coding sequence of genes. This information is not well captured by existing bioinformatic tools, but is effective in improving variant interpretation. To address this limitation in existing tools, we introduce the missense tolerance ratio (MTR), which summarizes available human standing variation data within genes to encapsulate population level genetic variation. We find that patient-ascertained pathogenic variants preferentially cluster in low MTR regions (P < 0.005) of well-informed genes. By evaluating 20 publicly available predictive tools across genes linked to epilepsy, we also highlight the importance of understanding the empirical null distribution of existing prediction tools, as these vary across genes. Subsequently integrating the MTR with the empirically selected bioinformatic tools in a gene-specific approach demonstrates a clear improvement in the ability to predict pathogenic missense variants from background missense variation in disease genes. Among an independent test sample of case and control missense variants, case variants (0.83 median score) consistently achieve higher pathogenicity prediction probabilities than control variants (0.02 median score; Mann-Whitney U test, P < 1 × 10−16). We focus on the application to epilepsy genes; however, the framework is applicable to disease genes beyond epilepsy.
Collapse
Affiliation(s)
- Joshua Traynelis
- Department of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria 3010, Australia
| | - Michael Silk
- Department of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria 3010, Australia
| | | | - Samuel F Berkovic
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, Victoria 3084, Australia
| | - Liping Liu
- Department of Mathematics, North Carolina A&T State University, Greensboro, North Carolina 27411, USA
| | - David B Ascher
- Department of Biochemistry and Molecular Biology, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - David J Balding
- Centre for Systems Genomics, School of BioSciences and School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Slavé Petrovski
- Department of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria 3010, Australia
| |
Collapse
|
14
|
Speed D, Cai N, Johnson MR, Nejentsev S, Balding DJ. Reevaluation of SNP heritability in complex human traits. Nat Genet 2017; 49:986-992. [PMID: 28530675 PMCID: PMC5493198 DOI: 10.1038/ng.3865] [Citation(s) in RCA: 244] [Impact Index Per Article: 34.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 04/18/2017] [Indexed: 12/15/2022]
Abstract
SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but the assumptions in current use have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency, linkage disequilibrium and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (standard deviation 3) higher than those obtained from the widely-used software GCTA, and 25% (standard deviation 2) higher than those from the recently-proposed extension GCTA-LDMS. Previously, DNaseI hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model their estimated contribution is only 24%.
Collapse
Affiliation(s)
- Doug Speed
- UCL Genetics Institute, University College London, London, UK
| | - Na Cai
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | | | | | | | - David J Balding
- UCL Genetics Institute, University College London, London, UK.,Centre for Systems Genomics, School of BioSciences, and School of Mathematics and Statistics, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
15
|
Fournier-Level A, Robin C, Balding DJ. GWAlpha: genome-wide estimation of additive effects (alpha) based on trait quantile distribution from pool-sequencing experiments. Bioinformatics 2017; 33:1246-1247. [PMID: 28003266 DOI: 10.1093/bioinformatics/btw805] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 12/15/2016] [Indexed: 11/13/2022] Open
Abstract
Motivation Sequencing pools of individuals (Pool-Seq) is a cost-effective way to gain insight into the genetics of complex traits, but as yet no parametric method has been developed to both test for genetic effects and estimate their magnitude. Here, we propose GWAlpha, a flexible method to obtain parametric estimates of genetic effects genome-wide from Pool-Seq experiments. Results We showed that GWAlpha powerfully replicates the results of Genome-Wide Association Studies (GWAS) from model organisms. We perform simulation studies that illustrate the effect on power of sample size and number of pools and test the method on different experimental data. Availability and Implementation GWAlpha is implemented in python, designed to run on Linux operating system and tested on Mac OS. It is freely available at https://github.com/aflevel/GWAlpha . Contact afournier@unimelb.edu.au. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Charles Robin
- School of BioSciences and Centre for Systems Genomics
| | - David J Balding
- School of BioSciences and Centre for Systems Genomics.,School of Mathematics and Statistics, The University of Melbourne, Parkville 3010, Australia
| |
Collapse
|
16
|
Ryan K, Williams DG, Balding DJ. Erratum to ‘Encoding of low-quality DNA profiles as genotype probability matrices for improved profile comparisons, relatedness evalation and database searches’. Forensic Science International: Genetics (2016) 227–239. Forensic Sci Int Genet 2017; 27:189-190. [DOI: 10.1016/j.fsigen.2016.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
17
|
Morrison GS, Kaye DH, Balding DJ, Taylor D, Dawid P, Aitken CG, Gittelson S, Zadora G, Robertson B, Willis S, Pope S, Neil M, Martire KA, Hepler A, Gill RD, Jamieson A, de Zoete J, Ostrum RB, Caliebe A. A comment on the PCAST report: Skip the “match”/“non-match” stage. Forensic Sci Int 2017; 272:e7-e9. [DOI: 10.1016/j.forsciint.2016.10.018] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 10/18/2016] [Indexed: 10/20/2022]
|
18
|
Ryan K, Williams DG, Balding DJ. Encoding of low-quality DNA profiles as genotype probability matrices for improved profile comparisons, relatedness evaluation and database searches. Forensic Sci Int Genet 2016; 25:227-239. [DOI: 10.1016/j.fsigen.2016.09.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2016] [Revised: 07/31/2016] [Accepted: 09/02/2016] [Indexed: 10/21/2022]
|
19
|
Radian S, Diekmann Y, Gabrovska P, Holland B, Bradley L, Wallace H, Stals K, Bussell AM, McGurren K, Cuesta M, Ryan AW, Herincs M, Hernández-Ramírez LC, Holland A, Samuels J, Aflorei ED, Barry S, Dénes J, Pernicova I, Stiles CE, Trivellin G, McCloskey R, Ajzensztejn M, Abid N, Akker SA, Mercado M, Cohen M, Thakker RV, Baldeweg S, Barkan A, Musat M, Levy M, Orme SM, Unterländer M, Burger J, Kumar AV, Ellard S, McPartlin J, McManus R, Linden GJ, Atkinson B, Balding DJ, Agha A, Thompson CJ, Hunter SJ, Thomas MG, Morrison PJ, Korbonits M. Increased Population Risk of AIP-Related Acromegaly and Gigantism in Ireland. Hum Mutat 2016; 38:78-85. [PMID: 27650164 PMCID: PMC5215436 DOI: 10.1002/humu.23121] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 09/13/2016] [Indexed: 01/06/2023]
Abstract
The aryl hydrocarbon receptor interacting protein (AIP) founder mutation R304* (or p.R304*; NM_003977.3:c.910C>T, p.Arg304Ter) identified in Northern Ireland (NI) predisposes to acromegaly/gigantism; its population health impact remains unexplored. We measured R304* carrier frequency in 936 Mid Ulster, 1,000 Greater Belfast (both in NI) and 2,094 Republic of Ireland (ROI) volunteers and in 116 NI or ROI acromegaly/gigantism patients. Carrier frequencies were 0.0064 in Mid Ulster (95%CI = 0.0027–0.013; P = 0.0005 vs. ROI), 0.001 in Greater Belfast (0.00011–0.0047) and zero in ROI (0–0.0014). R304* prevalence was elevated in acromegaly/gigantism patients in NI (11/87, 12.6%, P < 0.05), but not in ROI (2/29, 6.8%) versus non‐Irish patients (0–2.41%). Haploblock conservation supported a common ancestor for all the 18 identified Irish pedigrees (81 carriers, 30 affected). Time to most recent common ancestor (tMRCA) was 2550 (1,275–5,000) years. tMRCA‐based simulations predicted 432 (90–5,175) current carriers, including 86 affected (18–1,035) for 20% penetrance. In conclusion, R304* is frequent in Mid Ulster, resulting in numerous acromegaly/gigantism cases. tMRCA is consistent with historical/folklore accounts of Irish giants. Forward simulations predict many undetected carriers; geographically targeted population screening improves asymptomatic carrier identification, complementing clinical testing of patients/relatives. We generated disease awareness locally, necessary for early diagnosis and improved outcomes of AIP‐related disease.
Collapse
Affiliation(s)
- Serban Radian
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK.,Department of Endocrinology, Carol Davila University of Medicine and Pharmacy, C.I. Parhon National Institute of Endocrinology, Bucharest, Romania
| | - Yoan Diekmann
- Research Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Plamena Gabrovska
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Brendan Holland
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Lisa Bradley
- Department of Medical Genetics, Belfast HSC Trust, Belfast, UK
| | - Helen Wallace
- Regional Centre for Endocrinology and Diabetes, Royal Victoria Hospital, Belfast, UK
| | - Karen Stals
- Department of Molecular Genetics, Royal Devon and Exeter NHS Foundation Trust/ Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
| | - Anna-Marie Bussell
- Department of Molecular Genetics, Royal Devon and Exeter NHS Foundation Trust/ Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
| | - Karen McGurren
- Department of Endocrinology and Diabetes, Beaumont Hospital/RCSI Medical School, Dublin, Ireland
| | - Martin Cuesta
- Department of Endocrinology and Diabetes, Beaumont Hospital/RCSI Medical School, Dublin, Ireland
| | - Anthony W Ryan
- Department of Clinical Medicine and Institute of Molecular Medicine, Trinity College Dublin, Trinity Centre for Health Sciences, St James's Hospital, Dublin, Ireland
| | - Maria Herincs
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Laura C Hernández-Ramírez
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Aidan Holland
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Jade Samuels
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Elena Daniela Aflorei
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Sayka Barry
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Judit Dénes
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Ida Pernicova
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Craig E Stiles
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Giampaolo Trivellin
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Ronan McCloskey
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | | | - Noina Abid
- Royal Belfast Hospital for Sick Children, Belfast, UK
| | - Scott A Akker
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Moises Mercado
- Endocrinology Service/Experimental Endocrinology Unit, Hospital de Especialidades, Centro Medico Nacional Siglo XXI, IMSS, Mexico City, Mexico
| | - Mark Cohen
- Department of Endocrinology and Diabetes, Barnet General Hospital, London, UK
| | - Rajesh V Thakker
- Academic Endocrine Unit, OCDEM, University of Oxford, Oxford, UK
| | - Stephanie Baldeweg
- Department of Endocrinology and Diabetes, University College London Hospitals, London, UK
| | - Ariel Barkan
- Department of Neurosurgery, University of Michigan, Ann Arbor, Michigan, USA
| | - Madalina Musat
- Department of Endocrinology, Carol Davila University of Medicine and Pharmacy, C.I. Parhon National Institute of Endocrinology, Bucharest, Romania
| | - Miles Levy
- Department of Endocrinology, University Hospitals of Leicester, Leicester, UK
| | - Stephen M Orme
- Department of Endocrinology, St James's University Hospital, Leeds, UK
| | | | - Joachim Burger
- Institute of Anthropology, Johannes Gutenberg University, Mainz, Germany
| | - Ajith V Kumar
- North East Thames Regional Genetics Service, Great Ormond Street Hospital, London, UK
| | - Sian Ellard
- Department of Molecular Genetics, Royal Devon and Exeter NHS Foundation Trust/ Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
| | - Joseph McPartlin
- Trinity Biobank, Institute of Molecular Medicine, Trinity College Dublin, Trinity Centre for Health Sciences, St James's Hospital, Dublin, Ireland
| | - Ross McManus
- Department of Clinical Medicine and Institute of Molecular Medicine, Trinity College Dublin, Trinity Centre for Health Sciences, St James's Hospital, Dublin, Ireland
| | - Gerard J Linden
- Centre for Public Health, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, UK
| | - Brew Atkinson
- Regional Centre for Endocrinology and Diabetes, Royal Victoria Hospital, Belfast, UK
| | - David J Balding
- Research Department of Genetics, Evolution and Environment, University College London, London, UK.,School of Biosciences, University of Melbourne, Parkville, Victoria, Australia.,Schools of Mathematics and Statistics, University of Melbourne, Parkville, Victoria, Australia
| | - Amar Agha
- Department of Endocrinology and Diabetes, Beaumont Hospital/RCSI Medical School, Dublin, Ireland
| | - Chris J Thompson
- Department of Endocrinology and Diabetes, Beaumont Hospital/RCSI Medical School, Dublin, Ireland
| | - Steven J Hunter
- Regional Centre for Endocrinology and Diabetes, Royal Victoria Hospital, Belfast, UK
| | - Mark G Thomas
- Research Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Patrick J Morrison
- Department of Medical Genetics, Belfast HSC Trust, Belfast, UK.,Centre for Cancer Research and Cell Biology, Queens University Belfast, Belfast, UK
| | - Márta Korbonits
- Centre of Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| |
Collapse
|
20
|
Abstract
AbstractIn recent years statistical models for the analysis of complex (low-template and/or mixed) DNA profiles have moved from using only presence/absence information about allelic peaks in an electropherogram, to quantitative use of peak heights. This is challenging because peak heights are very variable and affected by a number of factors. We present a new peak-height model with important novel features, including over- and double-stutter, and a new approach to dropin. Our model is incorporated in open-source
Collapse
|
21
|
Abstract
We estimate the population genetics parameter (also referred to as the fixation index) from short tandem repeat (STR) allele frequencies, comparing many worldwide human subpopulations at approximately the national level with continental-scale populations. is commonly used to measure population differentiation, and is important in forensic DNA analysis to account for remote shared ancestry between a suspect and an alternative source of the DNA. We estimate comparing subpopulations with a hypothetical ancestral population, which is the approach most widely used in population genetics, and also compare a subpopulation with a sampled reference population, which is more appropriate for forensic applications. Both estimation methods are likelihood-based, in which is related to the variance of the multinomial-Dirichlet distribution for allele counts. Overall, we find low values, with posterior 97.5 percentiles when comparing a subpopulation with the most appropriate population, and even for inter-population comparisons we find . These are much smaller than single nucleotide polymorphism-based inter-continental estimates, and are also about half the magnitude of STR-based estimates from population genetics surveys that focus on distinct ethnic groups rather than a general population. Our findings support the use of up to 3% in forensic calculations, which corresponds to some current practice.
Collapse
|
22
|
Jeffares DC, Rallis C, Rieux A, Speed D, Převorovský M, Mourier T, Marsellach FX, Iqbal Z, Lau W, Cheng TM, Pracana R, Mülleder M, Lawson JL, Chessel A, Bala S, Hellenthal G, O’Fallon B, Keane T, Simpson JT, Bischof L, Tomiczek B, Bitton DA, Sideri T, Codlin S, Hellberg JE, van Trigt L, Jeffery L, Li JJ, Atkinson S, Thodberg M, Febrer M, McLay K, Drou N, Brown W, Hayles J, Carazo Salas RE, Ralser M, Maniatis N, Balding DJ, Balloux F, Durbin R, Bähler J. The genomic and phenotypic diversity of Schizosaccharomyces pombe. Nat Genet 2015; 47:235-41. [PMID: 25665008 PMCID: PMC4645456 DOI: 10.1038/ng.3215] [Citation(s) in RCA: 120] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 01/14/2015] [Indexed: 12/14/2022]
Abstract
Natural variation within species reveals aspects of genome evolution and function. The fission yeast Schizosaccharomyces pombe is an important model for eukaryotic biology, but researchers typically use one standard laboratory strain. To extend the usefulness of this model, we surveyed the genomic and phenotypic variation in 161 natural isolates. We sequenced the genomes of all strains, finding moderate genetic diversity (π = 3 × 10(-3) substitutions/site) and weak global population structure. We estimate that dispersal of S. pombe began during human antiquity (∼340 BCE), and ancestors of these strains reached the Americas at ∼1623 CE. We quantified 74 traits, finding substantial heritable phenotypic diversity. We conducted 223 genome-wide association studies, with 89 traits showing at least one association. The most significant variant for each trait explained 22% of the phenotypic variance on average, with indels having larger effects than SNPs. This analysis represents a rich resource to examine genotype-phenotype relationships in a tractable model.
Collapse
Affiliation(s)
- Daniel C. Jeffares
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Charalampos Rallis
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Adrien Rieux
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | - Doug Speed
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | - Martin Převorovský
- Department of Cell Biology, Charles University in Prague, Prague, Czech Republic
| | - Tobias Mourier
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | | | - Zamin Iqbal
- Wellcome Trust Centre for Human Genetics, Oxford, UK
| | - Winston Lau
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Tammy M.K. Cheng
- Cell Cycle Laboratory, Cancer Research UK London Research Institute, London, UK
| | - Rodrigo Pracana
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Michael Mülleder
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Jonathan L.D. Lawson
- Department of Genetics, University of Cambridge, Cambridge, UK
- The Gurdon Institute, University of Cambridge, Cambridge, UK
| | - Anatole Chessel
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Sendu Bala
- Wellcome Trust Sanger Institute, Cambridge, UK
| | - Garrett Hellenthal
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | | | | | | | - Leanne Bischof
- CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia; The Genome Analysis Centre, Norwich, UK
| | - Bartlomiej Tomiczek
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Danny A. Bitton
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Theodora Sideri
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Sandra Codlin
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | | | - Laurent van Trigt
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Linda Jeffery
- Cell Cycle Laboratory, Cancer Research UK London Research Institute, London, UK
| | - Juan-Juan Li
- Cell Cycle Laboratory, Cancer Research UK London Research Institute, London, UK
| | - Sophie Atkinson
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Malte Thodberg
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Melanie Febrer
- CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia; The Genome Analysis Centre, Norwich, UK
| | - Kirsten McLay
- CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia; The Genome Analysis Centre, Norwich, UK
| | - Nizar Drou
- CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia; The Genome Analysis Centre, Norwich, UK
| | - William Brown
- Centre for Genetics and Genomics, The University of Nottingham, Nottingham, UK
| | - Jacqueline Hayles
- Cell Cycle Laboratory, Cancer Research UK London Research Institute, London, UK
| | - Rafael E. Carazo Salas
- Department of Genetics, University of Cambridge, Cambridge, UK
- The Gurdon Institute, University of Cambridge, Cambridge, UK
| | - Markus Ralser
- Department of Biochemistry, University of Cambridge, Cambridge, UK
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
- Division of Physiology and Metabolism, MRC National Institute for Medical Research, London, UK
| | - Nikolas Maniatis
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - David J. Balding
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | - Francois Balloux
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | | | - Jürg Bähler
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| |
Collapse
|
23
|
Ruklisa D, Ware JS, Walsh R, Balding DJ, Cook SA. Bayesian models for syndrome- and gene-specific probabilities of novel variant pathogenicity. Genome Med 2015; 7:5. [PMID: 25649125 PMCID: PMC4308924 DOI: 10.1186/s13073-014-0120-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 12/05/2014] [Indexed: 12/04/2022] Open
Abstract
Background With the advent of affordable and comprehensive sequencing technologies, access to molecular genetics for clinical diagnostics and research applications is increasing. However, variant interpretation remains challenging, and tools that close the gap between data generation and data interpretation are urgently required. Here we present a transferable approach to help address the limitations in variant annotation. Methods We develop a network of Bayesian logistic regression models that integrate multiple lines of evidence to evaluate the probability that a rare variant is the cause of an individual’s disease. We present models for genes causing inherited cardiac conditions, though the framework is transferable to other genes and syndromes. Results Our models report a probability of pathogenicity, rather than a categorisation into pathogenic or benign, which captures the inherent uncertainty of the prediction. We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors. The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions. Though the probability scale is continuous, and innately interpretable, performance summaries based on thresholds are useful for comparisons. Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making. A web tool APPRAISE [http://www.cardiodb.org/APPRAISE] provides access to these models and predictions. Conclusions Our Bayesian framework provides a transparent, flexible and robust framework for the analysis and interpretation of rare genetic variants. Models tailored to specific genes outperform genome-wide approaches, and can be sufficiently accurate to inform clinical decision-making. Electronic supplementary material The online version of this article (doi:10.1186/s13073-014-0120-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - James S Ware
- NIHR Biomedical Research Unit in Cardiovascular Disease at Royal Brompton and Harefield NHS Foundation Trust and Imperial College, London, UK ; National Heart and Lung Institute, Imperial College, London, UK
| | - Roddy Walsh
- NIHR Biomedical Research Unit in Cardiovascular Disease at Royal Brompton and Harefield NHS Foundation Trust and Imperial College, London, UK
| | - David J Balding
- UCL Genetics Institute, London, UK ; Current address: Department of Genetics and Department of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Stuart A Cook
- NIHR Biomedical Research Unit in Cardiovascular Disease at Royal Brompton and Harefield NHS Foundation Trust and Imperial College, London, UK ; National Heart and Lung Institute, Imperial College, London, UK ; National Heart Centre, Singapore, Singapore ; Duke-National University, Singapore, Singapore
| |
Collapse
|
24
|
Abstract
When evaluating the weight of evidence (WoE) for an individual to be a contributor to a DNA sample, an allele frequency database is required. The allele frequencies are needed to inform about genotype probabilities for unknown contributors of DNA to the sample. Typically databases are available from several populations, and a common practice is to evaluate the WoE using each available database for each unknown contributor. Often the most conservative WoE (most favourable to the defence) is the one reported to the court. However the number of human populations that could be considered is essentially unlimited and the number of contributors to a sample can be large, making it impractical to perform every possible WoE calculation, particularly for complex crime scene profiles. We propose instead the use of only the database that best matches the ancestry of the queried contributor, together with a substantial FST adjustment. To investigate the degree of conservativeness of this approach, we performed extensive simulations of one- and two-contributor crime scene profiles, in the latter case with, and without, the profile of the second contributor available for the analysis. The genotypes were simulated using five population databases, which were also available for the analysis, and evaluations of WoE using our heuristic rule were compared with several alternative calculations using different databases. Using FST = 0.03, we found that our heuristic gave WoE more favourable to the defence than alternative calculations in well over 99% of the comparisons we considered; on average the difference in WoE was just under 0.2 bans (orders of magnitude) per locus. The degree of conservativeness of the heuristic rule can be adjusted through the FST value. We propose the use of this heuristic for DNA profile WoE calculations, due to its ease of implementation, and efficient use of the evidence while allowing a flexible degree of conservativeness. A heuristic rule of assuming the database of Q for all unprofiled individuals in a CSP is proposed. We simulate a total of 105 000 one- and two-contributor CSPs with no dropin or dropout. The heuristic rule is conservative compared to an alternative for the majority of simulated CSPs. We suggest that the use of this heuristic will allow for evaluation of complex cases with many possible databases.
Collapse
Affiliation(s)
| | - David J Balding
- UCL Genetics Institute, Darwin Building, Gower Street, London WC1E 6BT, UK.
| |
Collapse
|
25
|
Speed D, O'Brien TJ, Palotie A, Shkura K, Marson AG, Balding DJ, Johnson MR. Describing the genetic architecture of epilepsy through heritability analysis. ACTA ACUST UNITED AC 2014; 137:2680-9. [PMID: 25063994 PMCID: PMC4163034 DOI: 10.1093/brain/awu206] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Epilepsy is highly heritable, but its genetic architecture is poorly understood. Speed et al. estimate the number of susceptibility loci, show that common variants account for the majority of heritability, and demonstrate that epilepsy consists of genetically distinct subtypes. They conclude that gene-based prediction models may have clinical utility in first-seizure settings. Epilepsy is a disease with substantial missing heritability; despite its high genetic component, genetic association studies have had limited success detecting common variants which influence susceptibility. In this paper, we reassess the role of common variants on epilepsy using extensions of heritability analysis. Our data set consists of 1258 UK patients with epilepsy, of which 958 have focal epilepsy, and 5129 population control subjects, with genotypes recorded for over 4 million common single nucleotide polymorphisms. Firstly, we show that on the liability scale, common variants collectively explain at least 26% (standard deviation 5%) of phenotypic variation for all epilepsy and 27% (standard deviation 5%) for focal epilepsy. Secondly we provide a new method for estimating the number of causal variants for complex traits; when applied to epilepsy, our most optimistic estimate suggests that at least 400 variants influence disease susceptibility, with potentially many thousands. Thirdly, we use bivariate analysis to assess how similar the genetic architecture of focal epilepsy is to that of non-focal epilepsy; we demonstrate both significant differences (P = 0.004) and significant similarities (P = 0.01) between the two subtypes, indicating that although the clinical definition of focal epilepsy does identify a genetically distinct epilepsy subtype, there is also scope to improve the classification of epilepsy by incorporating genotypic information. Lastly, we investigate the potential value in using genetic data to diagnose epilepsy following a single epileptic seizure; we find that a prediction model explaining 10% of phenotypic variation could have clinical utility for deciding which single-seizure individuals are likely to benefit from immediate anti-epileptic drug therapy.
Collapse
Affiliation(s)
- Doug Speed
- 1 UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | - Terence J O'Brien
- 2 The Departments of Medicine and Neurology, The Royal Melbourne Hospital, The University of Melbourne, Australia
| | - Aarno Palotie
- 3 Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland 4 The Broad Institute of MIT and Harvard, Cambridge, USA 5 Department of Medical Genetics, University of Helsinki, Finland 6 University Central Hospital, Helsinki, Finland
| | - Kirill Shkura
- 7 Division of Brain Sciences, Imperial College London, London W6 8RF, UK 8 Medical Research Council (MRC) Clinical Sciences Centre, Faculty of Medicine, Imperial College London, UK
| | - Anthony G Marson
- 9 Department of Molecular and Clinical Pharmacology, University of Liverpool, UK
| | - David J Balding
- 1 UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | - Michael R Johnson
- 7 Division of Brain Sciences, Imperial College London, London W6 8RF, UK
| |
Collapse
|
26
|
Steele CD, Greenhalgh M, Balding DJ. Verifying likelihoods for low template DNA profiles using multiple replicates. Forensic Sci Int Genet 2014; 13:82-9. [PMID: 25082140 PMCID: PMC4234080 DOI: 10.1016/j.fsigen.2014.06.018] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Revised: 06/09/2014] [Accepted: 06/30/2014] [Indexed: 11/21/2022]
Abstract
The behaviour of multi-replicate LRs with respect to the inverse match probability is proposed as a method to validate forensic LR software. We perform lab-based and simulated experiments of one-, two- and three-contributor CSPs, as well as investigating a real-world CSP. LRs rise towards the IMP with additional replicates, while never exceeding it. Additionally, the LR from multiple low-template replicates can exceed that from a single good-quality sample. We validate likeLTD by demonstrating that it adheres to the expected behaviours.
To date there is no generally accepted method to test the validity of algorithms used to compute likelihood ratios (LR) evaluating forensic DNA profiles from low-template and/or degraded samples. An upper bound on the LR is provided by the inverse of the match probability, which is the usual measure of weight of evidence for standard DNA profiles not subject to the stochastic effects that are the hallmark of low-template profiles. However, even for low-template profiles the LR in favour of a true prosecution hypothesis should approach this bound as the number of profiling replicates increases, provided that the queried contributor is the major contributor. Moreover, for sufficiently many replicates the standard LR for mixtures is often surpassed by the low-template LR. It follows that multiple LTDNA replicates can provide stronger evidence for a contributor to a mixture than a standard analysis of a good-quality profile. Here, we examine the performance of the likeLTD software for up to eight replicate profiling runs. We consider simulated and laboratory-generated replicates as well as resampling replicates from a real crime case. We show that LRs generated by likeLTD usually do exceed the mixture LR given sufficient replicates, are bounded above by the inverse match probability and do approach this bound closely when this is expected. We also show good performance of likeLTD even when a large majority of alleles are designated as uncertain, and suggest that there can be advantages to using different profiling sensitivities for different replicates. Overall, our results support both the validity of the underlying mathematical model and its correct implementation in the likeLTD software.
Collapse
Affiliation(s)
| | - Matthew Greenhalgh
- Orchid Cellmark Ltd., Abingdon Business Park, Blacklands Way, Abingdon OX14 1YX, UK.
| | - David J Balding
- UCL Genetics Institute, Darwin Building, Gower Street, London WC1E 6BT, UK.
| |
Collapse
|
27
|
Abstract
BLUP (best linear unbiased prediction) is widely used to predict complex traits in plant and animal breeding, and increasingly in human genetics. The BLUP mathematical model, which consists of a single random effect term, was adequate when kinships were measured from pedigrees. However, when genome-wide SNPs are used to measure kinships, the BLUP model implicitly assumes that all SNPs have the same effect-size distribution, which is a severe and unnecessary limitation. We propose MultiBLUP, which extends the BLUP model to include multiple random effects, allowing greatly improved prediction when the random effects correspond to classes of SNPs with distinct effect-size variances. The SNP classes can be specified in advance, for example, based on SNP functional annotations, and we also provide an adaptive procedure for determining a suitable partition of SNPs. We apply MultiBLUP to genome-wide association data from the Wellcome Trust Case Control Consortium (seven diseases), and from much larger studies of celiac disease and inflammatory bowel disease, finding that it consistently provides better prediction than alternative methods. Moreover, MultiBLUP is computationally very efficient; for the largest data set, which includes 12,678 individuals and 1.5 M SNPs, the total analysis can be run on a single desktop PC in less than a day and can be parallelized to run even faster. Tools to perform MultiBLUP are freely available in our software LDAK.
Collapse
Affiliation(s)
- Doug Speed
- UCL Genetics Institute, University College London, London WC1E 6BT, United Kingdom
| | - David J Balding
- UCL Genetics Institute, University College London, London WC1E 6BT, United Kingdom
| |
Collapse
|
28
|
Amaral AFS, Ramasamy A, Castro-Giner F, Minelli C, Accordini S, Sørheim IC, Pin I, Kogevinas M, Jõgi R, Balding DJ, Norbäck D, Verlato G, Olivieri M, Probst-Hensch N, Janson C, Zock JP, Heinrich J, Jarvis DL. Interaction between gas cooking and GSTM1 null genotype in bronchial responsiveness: results from the European Community Respiratory Health Survey. Thorax 2014; 69:558-64. [PMID: 24613990 PMCID: PMC4033138 DOI: 10.1136/thoraxjnl-2013-204574] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Background Increased bronchial responsiveness is characteristic of asthma. Gas cooking, which is a major indoor source of the highly oxidant nitrogen dioxide, has been associated with respiratory symptoms and reduced lung function. However, little is known about the effect of gas cooking on bronchial responsiveness and on how this relationship may be modified by variants in the genes GSTM1, GSTT1 and GSTP1, which influence antioxidant defences. Methods The study was performed in subjects with forced expiratory volume in one second at least 70% of predicted who took part in the multicentre European Community Respiratory Health Survey, had bronchial responsiveness assessed by methacholine challenge and had been genotyped for GSTM1, GSTT1 and GSTP1-rs1695. Information on the use of gas for cooking was obtained from interviewer-led questionnaires. Effect modification by genotype on the association between the use of gas for cooking and bronchial responsiveness was assessed within each participating country, and estimates combined using meta-analysis. Results Overall, gas cooking, as compared with cooking with electricity, was not associated with bronchial responsiveness (β=−0.08, 95% CI −0.40 to 0.25, p=0.648). However, GSTM1 significantly modified this effect (β for interaction=−0.75, 95% CI −1.16 to −0.33, p=4×10−4), with GSTM1 null subjects showing more responsiveness if they cooked with gas. No effect modification by GSTT1 or GSTP1-rs1695 genotypes was observed. Conclusions Increased bronchial responsiveness was associated with gas cooking among subjects with the GSTM1 null genotype. This may reflect the oxidant effects on the bronchi of exposure to nitrogen dioxide.
Collapse
Affiliation(s)
- André F S Amaral
- Respiratory Epidemiology, Occupational Medicine and Public Health, National Heart and Lung Institute, Imperial College, London, UK
- MRC-PHE Centre for Environment & Health, London, UK
| | - Adaikalavan Ramasamy
- Respiratory Epidemiology, Occupational Medicine and Public Health, National Heart and Lung Institute, Imperial College, London, UK
| | - Francesc Castro-Giner
- Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- Molecular and Population Genetics Laboratory, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Cosetta Minelli
- Respiratory Epidemiology, Occupational Medicine and Public Health, National Heart and Lung Institute, Imperial College, London, UK
| | - Simone Accordini
- Unit of Epidemiology and Medical Statistics, Department of Public Health and Community Medicine, University of Verona, Verona, Italy
| | | | - Isabelle Pin
- Pédiatrie, CHU de Grenoble, Institut Albert Bonniot, INSERM, Grenoble, France
- Université Joseph Fourier, Grenoble, France
| | - Manolis Kogevinas
- Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
| | - Rain Jõgi
- Tartu University Hospital, Lung Clinic, Tartu, Estonia
| | - David J Balding
- UCL Genetics Institute, University College London, London, UK
| | - Dan Norbäck
- Department of Medical Science, Occupational and Environmental Medicine, Uppsala University, Uppsala, Sweden
| | - Giuseppe Verlato
- Unit of Epidemiology & Medical Statistics, Dept. of Public Health & Community Medicine, University of Verona, Verona, Italy
| | - Mario Olivieri
- Unit of Occupational Medicine, University Hospital of Verona, Verona, Italy
| | - Nicole Probst-Hensch
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Christer Janson
- Department of Medical Sciences, Respiratory Medicine and Allergology, Uppsala University, Uppsala, Sweden
| | - Jan-Paul Zock
- Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain
| | - Joachim Heinrich
- Helmholtz Zentrum München, German Research Centre for Environmental Health, Institute of Epidemiology I, Neuherberg, Germany
| | - Deborah L Jarvis
- Respiratory Epidemiology, Occupational Medicine and Public Health, National Heart and Lung Institute, Imperial College, London, UK
- MRC-PHE Centre for Environment & Health, London, UK
| |
Collapse
|
29
|
Gerbault P, Allaby RG, Boivin N, Rudzinski A, Grimaldi IM, Pires JC, Climer Vigueira C, Dobney K, Gremillion KJ, Barton L, Arroyo-Kalin M, Purugganan MD, Rubio de Casas R, Bollongino R, Burger J, Fuller DQ, Bradley DG, Balding DJ, Richerson PJ, Gilbert MTP, Larson G, Thomas MG. Storytelling and story testing in domestication. Proc Natl Acad Sci U S A 2014; 111:6159-64. [PMID: 24753572 PMCID: PMC4035932 DOI: 10.1073/pnas.1400425111] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The domestication of plants and animals marks one of the most significant transitions in human, and indeed global, history. Traditionally, study of the domestication process was the exclusive domain of archaeologists and agricultural scientists; today it is an increasingly multidisciplinary enterprise that has come to involve the skills of evolutionary biologists and geneticists. Although the application of new information sources and methodologies has dramatically transformed our ability to study and understand domestication, it has also generated increasingly large and complex datasets, the interpretation of which is not straightforward. In particular, challenges of equifinality, evolutionary variance, and emergence of unexpected or counter-intuitive patterns all face researchers attempting to infer past processes directly from patterns in data. We argue that explicit modeling approaches, drawing upon emerging methodologies in statistics and population genetics, provide a powerful means of addressing these limitations. Modeling also offers an approach to analyzing datasets that avoids conclusions steered by implicit biases, and makes possible the formal integration of different data types. Here we outline some of the modeling approaches most relevant to current problems in domestication research, and demonstrate the ways in which simulation modeling is beginning to reshape our understanding of the domestication process.
Collapse
Affiliation(s)
| | - Robin G. Allaby
- School of Life Sciences, Gibbet Hill Campus, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Nicole Boivin
- Research Laboratory for Archaeology and the History of Art, School of Archaeology, Oxford OX1 3QY, United Kingdom
| | - Anna Rudzinski
- Research Department of Genetics, Evolution, and Environment and
| | - Ilaria M. Grimaldi
- Research Laboratory for Archaeology and the History of Art, School of Archaeology, Oxford OX1 3QY, United Kingdom
| | - J. Chris Pires
- Division of Biological Sciences, University of Missouri, Columbia, MO 65211
| | | | - Keith Dobney
- Department of Archaeology, University of Aberdeen, Aberdeen AB24 3UF, United Kingdom
| | | | - Loukas Barton
- Department of Anthropology, Center for Comparative Archaeology, University of Pittsburgh, Pittsburgh, PA 15260
| | - Manuel Arroyo-Kalin
- Institute of Archaeology, University College London, London WC1H 0PY, United Kingdom
| | - Michael D. Purugganan
- Department of Biology, New York University, New York, NY 10003-6688
- Center for Genomics and Systems Biology, New York University Abu Dhabi Research Institute, Abu Dhabi, United Arab Emirates
| | | | - Ruth Bollongino
- Institute of Anthropology, Johannes Gutenberg University, D-55099 Mainz, Germany
| | - Joachim Burger
- Institute of Anthropology, Johannes Gutenberg University, D-55099 Mainz, Germany
| | - Dorian Q. Fuller
- Institute of Archaeology, University College London, London WC1H 0PY, United Kingdom
| | | | - David J. Balding
- University College London Genetics Institute, University College London, London WC1E 6BT, United Kingdom
| | - Peter J. Richerson
- Department of Environmental Science and Policy, University of California, Davis, CA 95616
| | - M. Thomas P. Gilbert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350 Copenhagen, Denmark; and
| | - Greger Larson
- Durham Evolution and Ancient DNA, Department of Archaeology, Durham University, Durham DH1 3LE, United Kingdom
| | - Mark G. Thomas
- Research Department of Genetics, Evolution, and Environment and
| |
Collapse
|
30
|
Couto Alves A, Bruhn S, Ramasamy A, Wang H, Holloway JW, Hartikainen AL, Jarvelin MR, Benson M, Balding DJ, Coin LJM. Dysregulation of complement system and CD4+ T cell activation pathways implicated in allergic response. PLoS One 2013; 8:e74821. [PMID: 24116013 PMCID: PMC3792967 DOI: 10.1371/journal.pone.0074821] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 08/06/2013] [Indexed: 11/18/2022] Open
Abstract
Allergy is a complex disease that is likely to involve dysregulated CD4+ T cell activation. Here we propose a novel methodology to gain insight into how coordinated behaviour emerges between disease-dysregulated pathways in response to pathophysiological stimuli. Using peripheral blood mononuclear cells of allergic rhinitis patients and controls cultured with and without pollen allergens, we integrate CD4+ T cell gene expression from microarray data and genetic markers of allergic sensitisation from GWAS data at the pathway level using enrichment analysis; implicating the complement system in both cellular and systemic response to pollen allergens. We delineate a novel disease network linking T cell activation to the complement system that is significantly enriched for genes exhibiting correlated gene expression and protein-protein interactions, suggesting a tight biological coordination that is dysregulated in the disease state in response to pollen allergen but not to diluent. This novel disease network has high predictive power for the gene and protein expression of the Th2 cytokine profile (IL-4, IL-5, IL-10, IL-13) and of the Th2 master regulator (GATA3), suggesting its involvement in the early stages of CD4+ T cell differentiation. Dissection of the complement system gene expression identifies 7 genes specifically associated with atopic response to pollen, including C1QR1, CFD, CFP, ITGB2, ITGAX and confirms the role of C3AR1 and C5AR1. Two of these genes (ITGB2 and C3AR1) are also implicated in the network linking complement system to T cell activation, which comprises 6 differentially expressed genes. C3AR1 is also significantly associated with allergic sensitisation in GWAS data.
Collapse
MESH Headings
- Allergens/pharmacology
- CD4-Positive T-Lymphocytes/drug effects
- CD4-Positive T-Lymphocytes/immunology
- CD4-Positive T-Lymphocytes/metabolism
- Cell Differentiation/drug effects
- Cell Differentiation/genetics
- Cytokines/genetics
- Cytokines/metabolism
- GATA3 Transcription Factor/genetics
- GATA3 Transcription Factor/metabolism
- Gene Expression Profiling
- Humans
- Leukocytes, Mononuclear/drug effects
- Leukocytes, Mononuclear/immunology
- Leukocytes, Mononuclear/metabolism
- Lymphocyte Activation/drug effects
- Lymphocyte Activation/genetics
- Lymphocyte Activation/immunology
- Pollen
- Receptors, Complement/genetics
- Receptors, Complement/metabolism
- Rhinitis, Allergic, Seasonal/genetics
- Rhinitis, Allergic, Seasonal/immunology
- Rhinitis, Allergic, Seasonal/metabolism
Collapse
Affiliation(s)
- Alexessander Couto Alves
- Department of Epidemiology and Biostatistics, Imperial College London, MRC-HPA Centre for Environment and Health, Imperial College London, London, United Kingdom
| | - Sören Bruhn
- Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
| | - Adaikalavan Ramasamy
- Department of Epidemiology and Biostatistics, Imperial College London, MRC-HPA Centre for Environment and Health, Imperial College London, London, United Kingdom
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
| | - Hui Wang
- Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
- Dept of Paediatrics, Gothenburg University, Gothenburg, Sweden
| | - John W. Holloway
- Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Anna-Liisa Hartikainen
- Department of Clinical Sciences, Obstetrics and Gynecology, Institute of Clinical Medicine, University of Oulu, Oulu, Finland
| | - Marjo-Riitta Jarvelin
- Department of Epidemiology and Biostatistics, Imperial College London, MRC-HPA Centre for Environment and Health, Imperial College London, London, United Kingdom
- Institute of Health Sciences, University of Oulu, and Unit of General Practice, University Hospital of Oulu, Oulu, Finland
- Biocenter Oulu, University of Oulu, Oulu, Finland
- National Institute of Health and Welfare, Oulu, Finland
| | - Mikael Benson
- Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
| | - David J. Balding
- Department of Epidemiology and Biostatistics, Imperial College London, MRC-HPA Centre for Environment and Health, Imperial College London, London, United Kingdom
- Genetics Institute, University College London, United Kingdom
| | - Lachlan J. M. Coin
- Department of Genomics of Common Diseases, School of Public Health, Imperial College London, London, United Kingdom
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
| |
Collapse
|
31
|
Bauer P, Balding DJ, Klünemann HH, Linden DEJ, Ory DS, Pineda M, Priller J, Sedel F, Muller A, Chadha-Boreham H, Welford RWD, Strasser DS, Patterson MC. Genetic screening for Niemann-Pick disease type C in adults with neurological and psychiatric symptoms: findings from the ZOOM study. Hum Mol Genet 2013; 22:4349-56. [PMID: 23773996 PMCID: PMC3792693 DOI: 10.1093/hmg/ddt284] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Niemann–Pick disease type C (NP-C) is a rare, autosomal-recessive, progressive neurological disease caused by mutations in either the NPC1 gene (in 95% of cases) or the NPC2 gene. This observational, multicentre genetic screening study evaluated the frequency and phenotypes of NP-C in consecutive adult patients with neurological and psychiatric symptoms. Diagnostic testing for NP-C involved NPC1 and NPC2 exonic gene sequencing and gene dosage analysis. When available, results of filipin staining, plasma cholestane-3β,5α,6β-triol assays and measurements of relevant sphingolipids were also collected. NPC1 and NPC2 gene sequencing was completed in 250/256 patients from 30 psychiatric and neurological reference centres across the EU and USA [median (range) age 38 (18–90) years]. Three patients had a confirmed diagnosis of NP-C; two based on gene sequencing alone (two known causal disease alleles) and one based on gene sequencing and positive filipin staining. A further 12 patients displayed either single mutant NP-C alleles (8 with NPC1 mutations and 3 with NPC2 mutations) or a known causal disease mutation and an unclassified NPC1 allele variant (1 patient). Notably, high plasma cholestane-3β,5α,6β-triol levels were observed for all NP-C cases (n = 3). Overall, the frequency of NP-C patients in this study [1.2% (95% CI; 0.3%, 3.5%)] suggests that there may be an underdiagnosed pool of NP-C patients among adults who share common neurological and psychiatric symptoms.
Collapse
Affiliation(s)
- Peter Bauer
- Institute of Medical Genetics and Applied Genomics, Tübingen University, Tübingen, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Mead S, Uphill J, Beck J, Poulter M, Campbell T, Lowe J, Adamson G, Hummerich H, Klopp N, Rückert IM, Wichmann HE, Azazi D, Plagnol V, Pako WH, Whitfield J, Alpers MP, Whittaker J, Balding DJ, Zerr I, Kretzschmar H, Collinge J. Genome-wide association study in multiple human prion diseases suggests genetic risk factors additional to PRNP. Hum Mol Genet 2012; 21:1897-906. [PMID: 22210626 PMCID: PMC3313791 DOI: 10.1093/hmg/ddr607] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Revised: 12/04/2011] [Accepted: 12/16/2011] [Indexed: 11/14/2022] Open
Abstract
Prion diseases are fatal neurodegenerative diseases of humans and animals caused by the misfolding and aggregation of prion protein (PrP). Mammalian prion diseases are under strong genetic control but few risk factors are known aside from the PrP gene locus (PRNP). No genome-wide association study (GWAS) has been done aside from a small sample of variant Creutzfeldt-Jakob disease (CJD). We conducted GWAS of sporadic CJD (sCJD), variant CJD (vCJD), iatrogenic CJD, inherited prion disease, kuru and resistance to kuru despite attendance at mortuary feasts. After quality control, we analysed 2000 samples and 6015 control individuals (provided by the Wellcome Trust Case Control Consortium and KORA-gen) for 491032-511862 SNPs in the European study. Association studies were done in each geographical and aetiological group followed by several combined analyses. The PRNP locus was highly associated with risk in all geographical and aetiological groups. This association was driven by the known coding variation at rs1799990 (PRNP codon 129). No non-PRNP loci achieved genome-wide significance in the meta-analysis of all human prion disease. SNPs at the ZBTB38-RASA2 locus were associated with CJD in the UK (rs295301, P = 3.13 × 10(-8); OR, 0.70) but these SNPs showed no replication evidence of association in German sCJD or in Papua New Guinea-based tests. A SNP in the CHN2 gene was associated with vCJD [P = 1.5 × 10(-7); odds ratio (OR), 2.36], but not in UK sCJD (P = 0.049; OR, 1.24), in German sCJD or in PNG groups. In the overall meta-analysis of CJD, 14 SNPs were associated (P < 10(-5); two at PRNP, three at ZBTB38-RASA2, nine at nine other independent non-PRNP loci), more than would be expected by chance. None of the loci recently identified as genome-wide significant in studies of other neurodegenerative diseases showed any clear evidence of association in prion diseases. Concerning common genetic variation, it is likely that the PRNP locus contains the only strong risk factors that act universally across human prion diseases. Our data are most consistent with several other risk loci of modest overall effects which will require further genetic association studies to provide definitive evidence.
Collapse
Affiliation(s)
- Simon Mead
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
| | - James Uphill
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
| | - John Beck
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
| | - Mark Poulter
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
| | - Tracy Campbell
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
| | - Jessica Lowe
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
| | - Gary Adamson
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
| | - Holger Hummerich
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
| | - Norman Klopp
- KORA-gen, Helmholtz-Zentrum München, Institute for Epidemiology, Ingolstaedter Landstrasse 1, 85764 Neuherberg, Germany
| | - Ina-Maria Rückert
- KORA-gen, Helmholtz-Zentrum München, Institute for Epidemiology, Ingolstaedter Landstrasse 1, 85764 Neuherberg, Germany
| | - H-Erich Wichmann
- KORA-gen, Helmholtz-Zentrum München, Institute for Epidemiology, Ingolstaedter Landstrasse 1, 85764 Neuherberg, Germany
| | - Dhoyazan Azazi
- Department of Statistics, Institute of Genetics, University College London, Darwin Building Gower Street, London WC1E 6BT, UK
| | - Vincent Plagnol
- Department of Statistics, Institute of Genetics, University College London, Darwin Building Gower Street, London WC1E 6BT, UK
| | - Wandagi H. Pako
- Papua New Guinea (PNG) Institute of Medical Research, Goroka, EHP, Papua New Guinea
| | - Jerome Whitfield
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
- Centre for International Health, Curtin University, Perth, Australia
| | - Michael P. Alpers
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
- Papua New Guinea (PNG) Institute of Medical Research, Goroka, EHP, Papua New Guinea
- Centre for International Health, Curtin University, Perth, Australia
| | - John Whittaker
- London School of Hygiene and Tropical Medicine, LondonWC1E 7HT, UK
| | - David J. Balding
- Department of Statistics, Institute of Genetics, University College London, Darwin Building Gower Street, London WC1E 6BT, UK
| | - Inga Zerr
- Department of Neurology, Georg-August University Göttingen, Göttingen, Germany and
| | - Hans Kretzschmar
- Center for Neuropathology and Prion Research, Ludwig-Maximilians-University Munich, Feodor-Lynen-Str. 23, D-81377 Munich, Germany
| | - John Collinge
- MRC Prion Unit and Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, LondonWC1N 3BG, UK
| |
Collapse
|
33
|
Ramasamy A, Curjuric I, Coin LJ, Kumar A, McArdle WL, Imboden M, Leynaert B, Kogevinas M, Schmid-Grendelmeier P, Pekkanen J, Wjst M, Bircher AJ, Sovio U, Rochat T, Hartikainen AL, Balding DJ, Jarvelin MR, Probst-Hensch N, Strachan DP, Jarvis DL. A genome-wide meta-analysis of genetic variants associated with allergic rhinitis and grass sensitization and their interaction with birth order. J Allergy Clin Immunol 2011; 128:996-1005. [PMID: 22036096 DOI: 10.1016/j.jaci.2011.08.030] [Citation(s) in RCA: 145] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Revised: 08/22/2011] [Accepted: 08/29/2011] [Indexed: 01/03/2023]
Abstract
BACKGROUND Hay fever or seasonal allergic rhinitis (AR) is a chronic disorder associated with IgE sensitization to grass. The underlying genetic variants have not been studied comprehensively. There is overwhelming evidence that those who have older siblings have less AR, although the mechanism for this remains unclear. OBJECTIVE We sought to identify common genetic variant associations with prevalent AR and grass sensitization using existing genome-wide association study (GWAS) data and to determine whether genetic variants modify the protective effect of older siblings. METHOD Approximately 2.2 million genotyped or imputed single nucleotide polymorphisms were investigated in 4 large European adult cohorts for AR (3,933 self-reported cases vs 8,965 control subjects) and grass sensitization (2,315 cases vs 10,032 control subjects). RESULTS Three loci reached genome-wide significance for either phenotype. The HLA variant rs7775228, which cis-regulates HLA-DRB4, was strongly associated with grass sensitization and weakly with AR (P(grass) = 1.6 × 10(-9); P(AR) = 8.0 × 10(-3)). Variants in a locus near chromosome 11 open reading frame 30 (C11orf30) and leucine-rich repeat containing 32 (LRRC32), which was previously associated with atopic dermatitis and eczema, were also strongly associated with both phenotypes (rs2155219; P(grass) = 9.4 × 10(-9); P(AR) = 3.8 × 10(-8)). The third genome-wide significant variant was rs17513503 (P(grass) = 1.2 × 10(-8); PAR = 7.4 × 10(-7)) which was located near transmembrane protein 232 (TMEM232) and solute carrier family 25, member 46 (SLC25A46). Twelve further loci with suggestive associations were also identified. Using a candidate gene approach, where we considered variants within 164 genes previously thought to be important, we found variants in 3 further genes that may be of interest: thymic stromal lymphopoietin (TSLP), Toll-like receptor 6 (TLR6) and nucleotide-binding oligomerization domain containing 1 (NOD1/CARD4). We found no evidence for variants that modified the effect of birth order on either phenotype. CONCLUSIONS This relatively large meta-analysis of GWASs identified few loci associated with AR and grass sensitization. No birth order interaction was identified in the current analyses.
Collapse
Affiliation(s)
- Adaikalavan Ramasamy
- Respiratory Epidemiology and Public Health, Imperial College, London, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
|
35
|
Abstract
Despite the success of genome-wide association studies (GWASs) in identifying loci associated with common diseases, a substantial proportion of the causality remains unexplained. Recent advances in genomic technologies have placed us in a position to initiate large-scale studies of human disease-associated epigenetic variation, specifically variation in DNA methylation. Such epigenome-wide association studies (EWASs) present novel opportunities but also create new challenges that are not encountered in GWASs. We discuss EWAS design, cohort and sample selections, statistical significance and power, confounding factors and follow-up studies. We also discuss how integration of EWASs with GWASs can help to dissect complex GWAS haplotypes for functional analysis.
Collapse
Affiliation(s)
- Vardhman K Rakyan
- Blizard Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, Queen Mary, University of London, London, UK.
| | | | | | | |
Collapse
|
36
|
Chahal HS, Stals K, Unterländer M, Balding DJ, Thomas MG, Kumar AV, Besser GM, Atkinson AB, Morrison PJ, Howlett TA, Levy MJ, Orme SM, Akker SA, Abel RL, Grossman AB, Burger J, Ellard S, Korbonits M. AIP mutation in pituitary adenomas in the 18th century and today. N Engl J Med 2011; 364:43-50. [PMID: 21208107 DOI: 10.1056/nejmoa1008020] [Citation(s) in RCA: 108] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Gigantism results when a growth hormone-secreting pituitary adenoma is present before epiphyseal fusion. In 1909, when Harvey Cushing examined the skeleton of an Irish patient who lived from 1761 to 1783, he noted an enlarged pituitary fossa. We extracted DNA from the patient's teeth and identified a germline mutation in the aryl hydrocarbon-interacting protein gene (AIP). Four contemporary Northern Irish families who presented with gigantism, acromegaly, or prolactinoma have the same mutation and haplotype associated with the mutated gene. Using coalescent theory, we infer that these persons share a common ancestor who lived about 57 to 66 generations earlier.
Collapse
Affiliation(s)
- Harvinder S Chahal
- Department of Endocrinology, Barts and the London School of Medicine, Queen Mary University of London, London, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Abstract
We conducted a two-stage genome-wide association study to identify common genetic variation altering risk of the metabolic syndrome and related phenotypes in Indian Asian men, who have a high prevalence of these conditions. In Stage 1, approximately 317,000 single nucleotide polymorphisms were genotyped in 2700 individuals, from which 1500 SNPs were selected to be genotyped in a further 2300 individuals. Selection for inclusion in Stage 1 was based on four metabolic syndrome component traits: HDL-cholesterol, plasma glucose and Type 2 diabetes, abdominal obesity measured by waist to hip ratio, and diastolic blood pressure. Association was tested with these four traits and a composite metabolic syndrome phenotype. Four SNPs reaching significance level p<5×10−7 and with posterior probability of association >0.8 were found in genes CETP and LPL, associated with HDL-cholesterol. These associations have already been reported in Indian Asians and in Europeans. Five additional loci harboured SNPs significant at p<10−6 and posterior probability >0.5 for HDL-cholesterol, type 2 diabetes or diastolic blood pressure. Our results suggest that the primary genetic determinants of metabolic syndrome are the same in Indian Asians as in other populations, despite the higher prevalence. Further, we found little evidence of a common genetic basis for metabolic syndrome traits in our sample of Indian Asian men.
Collapse
Affiliation(s)
- Delilah Zabaneh
- Department of Epidemiology and Public Health, Imperial College London, London, UK.
| | | |
Collapse
|
38
|
Calboli FCF, Tozzi F, Galwey NW, Antoniades A, Mooser V, Preisig M, Vollenweider P, Waterworth D, Waeber G, Johnson MR, Muglia P, Balding DJ. A genome-wide association study of neuroticism in a population-based sample. PLoS One 2010; 5:e11504. [PMID: 20634892 PMCID: PMC2901337 DOI: 10.1371/journal.pone.0011504] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2009] [Accepted: 05/17/2010] [Indexed: 11/22/2022] Open
Abstract
Neuroticism is a moderately heritable personality trait considered to be a risk factor for developing major depression, anxiety disorders and dementia. We performed a genome-wide association study in 2,235 participants drawn from a population-based study of neuroticism, making this the largest association study for neuroticism to date. Neuroticism was measured by the Eysenck Personality Questionnaire. After Quality Control, we analysed 430,000 autosomal SNPs together with an additional 1.2 million SNPs imputed with high quality from the Hap Map CEU samples. We found a very small effect of population stratification, corrected using one principal component, and some cryptic kinship that required no correction. NKAIN2 showed suggestive evidence of association with neuroticism as a main effect (p<10−6) and GPC6 showed suggestive evidence for interaction with age (p≈10−7). We found support for one previously-reported association (PDE4D), but failed to replicate other recent reports. These results suggest common SNP variation does not strongly influence neuroticism. Our study was powered to detect almost all SNPs explaining at least 2% of heritability, and so our results effectively exclude the existence of loci having a major effect on neuroticism.
Collapse
Affiliation(s)
- Federico C F Calboli
- Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Coin LJM, Asher JE, Walters RG, El-Sayed Moustafa JS, de Smith AJ, Sladek R, Balding DJ, Froguel P, Blakemore AIF. cnvHap: an integrative population and haplotype–based multiplatform model of SNPs and CNVs. Nat Methods 2010; 7:541-6. [DOI: 10.1038/nmeth.1466] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 05/05/2010] [Indexed: 11/09/2022]
|
40
|
Su SY, Asher JE, Jarvelin MR, Froguel P, Blakemore AIF, Balding DJ, Coin LJM. Inferring combined CNV/SNP haplotypes from genotype data. ACTA ACUST UNITED AC 2010; 26:1437-45. [PMID: 20406911 DOI: 10.1093/bioinformatics/btq157] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
MOTIVATION Copy number variations (CNVs) are increasingly recognized as an substantial source of individual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy number, few have been designed for inferring CNV haplotypic phase and none of these are applicable at genome-wide scale. Here, we present a method for inferring missing CNV genotypes, predicting CNV allelic configuration and for inferring CNV haplotypic phase from SNP/CNV genotype data. Our method, implemented in the software polyHap v2.0, is based on a hidden Markov model, which models the joint haplotype structure between CNVs and SNPs. Thus, haplotypic phase of CNVs and SNPs are inferred simultaneously. A sampling algorithm is employed to obtain a measure of confidence/credibility of each estimate. RESULTS We generated diploid phase-known CNV-SNP genotype datasets by pairing male X chromosome CNV-SNP haplotypes. We show that polyHap provides accurate estimates of missing CNV genotypes, allelic configuration and CNV haplotypic phase on these datasets. We applied our method to a non-simulated dataset-a region on Chromosome 2 encompassing a short deletion. The results confirm that polyHap's accuracy extends to real-life datasets. AVAILABILITY Our method is implemented in version 2.0 of the polyHap software package and can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin.
Collapse
Affiliation(s)
- Shu-Yi Su
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, London W2 1PG, UK
| | | | | | | | | | | | | |
Collapse
|
41
|
Zabaneh D, Chambers JC, Elliott P, Scott J, Balding DJ, Kooner JS. Heritability and genetic correlations of insulin resistance and component phenotypes in Asian Indian families using a multivariate analysis. Diabetologia 2009; 52:2585-9. [PMID: 19763535 DOI: 10.1007/s00125-009-1504-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/20/2009] [Accepted: 08/03/2009] [Indexed: 11/25/2022]
Abstract
AIMS/HYPOTHESIS Insulin resistance and related metabolic disturbances are more common among Asian Indians than European whites. Little is known about the heritability of insulin resistance traits in Asian Indians. Our objective was to estimate heritabilities and genetic correlations in Asian Indian families. METHODS Phenotypic data were assembled for 181 UK Asian Indian probands with premature CHD, and their 1,454 first-, second- and third-degree relatives. We calculated (narrow-sense) heritabilities and genetic correlations for insulin resistance traits, and common environmental effects using all study participants and a multivariate model. The analysis was repeated in a subsample consisting of individuals not on drug therapy. RESULTS Heritability estimates (SE) for individuals not on drug therapy were: BMI 0.31 (0.04), WHR 0.27 (0.04), systolic BP 0.29 (0.03), triacylglycerol 0.40 (0.04), HDL-cholesterol 0.53 (0.04), glucose 0.37 (0.03), HOMA of insulin resistance (HOMA-IR) 0.22 (0.04), and HbA(1c) 0.60 (0.04). We observed many significant genetic correlations between the traits, in particular between HOMA-IR and BMI. Heritability estimates were lower for all phenotypes when analysed among all participants. CONCLUSIONS/INTERPRETATION Genetic factors contribute to a significant proportion of the total variance in insulin resistance and related metabolic disturbances in Asian Indian CHD families.
Collapse
Affiliation(s)
- D Zabaneh
- Department of Epidemiology and Public Health, Imperial College London, St Mary's Campus, Norfolk Place, London, W2 1PG, UK.
| | | | | | | | | | | |
Collapse
|
42
|
|
43
|
Abstract
Bayesian statistical methods have recently made great inroads into many areas of science, and this advance is now extending to the assessment of association between genetic variants and disease or other phenotypes. We review these methods, focusing on single-SNP tests in genome-wide association studies. We discuss the advantages of the Bayesian approach over classical (frequentist) approaches in this setting and provide a tutorial on basic analysis steps, including practical guidelines for appropriate prior specification. We demonstrate the use of Bayesian methods for fine mapping in candidate regions, discuss meta-analyses and provide guidance for refereeing manuscripts that contain Bayesian analyses.
Collapse
Affiliation(s)
- Matthew Stephens
- Departments of Statistics and Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| | | |
Collapse
|
44
|
Vignal C, Bansal AT, Balding DJ, Binks MH, Dickson MC, Montgomery DS, Wilson AG. Genetic association of the major histocompatibility complex with rheumatoid arthritis implicates two non-DRB1 loci. ACTA ACUST UNITED AC 2009; 60:53-62. [PMID: 19116923 DOI: 10.1002/art.24138] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
OBJECTIVE The HLA-DRB1 locus within the major histocompatibility complex (MHC) at 6p21.3 has been identified as a susceptibility gene for rheumatoid arthritis (RA); however, there is increasing evidence of additional susceptibility genes in the MHC region. The aim of this study was to estimate their number and location. METHODS A case-control study was performed involving 977 control subjects and 855 RA patients. The HLA-DRB1 locus was genotyped together with 2,360 single-nucleotide polymorphisms in the MHC region. Logistic regression was used to detect DRB1-independent effects. RESULTS After adjusting for the effect of HLA-DRB1, 18 markers in 14 genes were strongly associated with RA (P<10(-4)). Multivariate logistic regression analysis of these markers and DRB1 led to a model containing DRB1 plus the following 3 markers: rs4678, a nonsynonymous change in the VARS2L locus, approximately 1.7 Mb telomeric of DRB1; rs2442728, upstream of HLA-B, approximately 1.2 Mb telomeric of DRB1; and rs17499655, located in the 5'-untranslated region of DQA2, only 0.1 Mb centromeric of DRB1. In-depth investigation of the DQA2 association, however, suggested that it arose through cryptic linkage disequilibrium with an allele of DRB1. Two non-shared epitope alleles were also strongly associated with RA (P<10(-4)): *0301 with anti- cyclic citrullinated peptide-negative RA and *0701 independently of autoantibody status. CONCLUSION These results confirm the polygenic contribution of the MHC to RA and implicate 2 additional non-DRB1 susceptibility loci. The role of the HLA-DQ locus in RA has been a subject of controversy, but in our data, it appears to be spurious.
Collapse
|
45
|
Su SY, White J, Balding DJ, Coin LJM. Inference of haplotypic phase and missing genotypes in polyploid organisms and variable copy number genomic regions. BMC Bioinformatics 2008; 9:513. [PMID: 19046436 PMCID: PMC2647950 DOI: 10.1186/1471-2105-9-513] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2008] [Accepted: 12/01/2008] [Indexed: 12/18/2022] Open
Abstract
Background The power of haplotype-based methods for association studies, identification of regions under selection, and ancestral inference, is well-established for diploid organisms. For polyploids, however, the difficulty of determining phase has limited such approaches. Polyploidy is common in plants and is also observed in animals. Partial polyploidy is sometimes observed in humans (e.g. trisomy 21; Down's syndrome), and it arises more frequently in some human tissues. Local changes in ploidy, known as copy number variations (CNV), arise throughout the genome. Here we present a method, implemented in the software polyHap, for the inference of haplotype phase and missing observations from polyploid genotypes. PolyHap allows each individual to have a different ploidy, but ploidy cannot vary over the genomic region analysed. It employs a hidden Markov model (HMM) and a sampling algorithm to infer haplotypes jointly in multiple individuals and to obtain a measure of uncertainty in its inferences. Results In the simulation study, we combine real haplotype data to create artificial diploid, triploid, and tetraploid genotypes, and use these to demonstrate that polyHap performs well, in terms of both switch error rate in recovering phase and imputation error rate for missing genotypes. To our knowledge, there is no comparable software for phasing a large, densely genotyped region of chromosome from triploids and tetraploids, while for diploids we found polyHap to be more accurate than fastPhase. We also compare the results of polyHap to SATlotyper on an experimentally haplotyped tetraploid dataset of 12 SNPs, and show that polyHap is more accurate. Conclusion With the availability of large SNP data in polyploids and CNV regions, we believe that polyHap, our proposed method for inferring haplotypic phase from genotype data, will be useful in enabling researchers analysing such data to exploit the power of haplotype-based analyses.
Collapse
Affiliation(s)
- Shu-Yi Su
- Department of Epidemiology and Public Health, Imperial College, London, W2 1PG, UK.
| | | | | | | |
Collapse
|
46
|
Cornuet JM, Santos F, Beaumont MA, Robert CP, Marin JM, Balding DJ, Guillemaud T, Estoup A. Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. ACTA ACUST UNITED AC 2008; 24:2713-9. [PMID: 18842597 PMCID: PMC2639274 DOI: 10.1093/bioinformatics/btn514] [Citation(s) in RCA: 453] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Summary: Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC. Availability: The software DIY ABC is freely available at http://www.montpellier.inra.fr/CBGP/diyabc. Contact:j.cornuet@imperial.ac.uk Supplementary information: Supplementary data are also available at http://www.montpellier.inra.fr/CBGP/diyabc
Collapse
Affiliation(s)
- Jean-Marie Cornuet
- Department of Epidemiology and Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK.
| | | | | | | | | | | | | | | |
Collapse
|
47
|
Chadeau-Hyam M, Hoggart CJ, O'Reilly PF, Whittaker JC, De Iorio M, Balding DJ. Fregene: simulation of realistic sequence-level data in populations and ascertained samples. BMC Bioinformatics 2008; 9:364. [PMID: 18778480 PMCID: PMC2542380 DOI: 10.1186/1471-2105-9-364] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2008] [Accepted: 09/08/2008] [Indexed: 01/28/2023] Open
Abstract
Background FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is implemented in FREGENE and provides the opportunity to test theoretical predictions and gain new insights into mechanisms of selection. We describe here main functionalities of both FREGENE and SAMPLE, a companion program that can replicate association study datasets. Results We report detailed analyses of six large simulated datasets that we have made publicly available. Three demographic scenarios are modelled: one panmictic, one substructured with migration, and one complex scenario that mimics the principle features of genetic variation in major worldwide human populations. For each scenario there is one neutral simulation, and one with a complex pattern of selection. Conclusion FREGENE and the simulated datasets will be valuable for assessing the validity of models for selection, demography and population genetic parameters, as well as the efficacy of association studies. Its principle advantages are modelling flexibility and computational efficiency. It is open source and object-oriented. As such, it can be customised and the range of models extended.
Collapse
Affiliation(s)
- Marc Chadeau-Hyam
- Department of Epidemiology and Public Health, Imperial College, St Mary's Campus, Norfolk Place, London, W2 1PG, UK.
| | | | | | | | | | | |
Collapse
|
48
|
Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet 2008; 4:e1000130. [PMID: 18654633 PMCID: PMC2464715 DOI: 10.1371/journal.pgen.1000130] [Citation(s) in RCA: 258] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2008] [Accepted: 06/17/2008] [Indexed: 11/19/2022] Open
Abstract
Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.
Collapse
Affiliation(s)
- Clive J Hoggart
- Department of Epidemiology and Public Health, Imperial College, London, United Kingdom.
| | | | | | | |
Collapse
|
49
|
Clark TG, Andrew T, Cooper GM, Margulies EH, Mullikin JC, Balding DJ. Functional constraint and small insertions and deletions in the ENCODE regions of the human genome. Genome Biol 2008; 8:R180. [PMID: 17784950 PMCID: PMC2375018 DOI: 10.1186/gb-2007-8-9-r180] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2006] [Revised: 09/04/2007] [Accepted: 09/04/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We describe the distribution of indels in the 44 Encyclopedia of DNA Elements (ENCODE) regions (about 1% of the human genome) and evaluate the potential contributions of small insertion and deletion polymorphisms (indels) to human genetic variation. We relate indels to known genomic annotation features and measures of evolutionary constraint. RESULTS Indel rates are observed to be reduced approximately 20-fold to 60-fold in exonic regions, 5-fold to 10-fold in sequence that exhibits high evolutionary constraint in mammals, and up to 2-fold in some classes of regulatory elements (for instance, formaldehyde assisted isolation of regulatory elements [FAIRE] and hypersensitive sites). In addition, some noncoding transcription and other chromatin mediated regulatory sites also have reduced indel rates. Overall indel rates for these data are estimated to be smaller than single nucleotide polymorphism (SNP) rates by a factor of approximately 2, with both rates measured as base pairs per 100 kilobases to facilitate comparison. CONCLUSION Indel rates exhibit a broadly similar distribution across genomic features compared with SNP density rates, with a reduction in rates in coding transcription and evolutionarily constrained sequence. However, unlike indels, SNP rates do not appear to be reduced in some noncoding functional sequences, such as pseudo-exons, and FAIRE and hypersensitive sites. We conclude that indel rates are greatly reduced in transcribed and evolutionarily constrained DNA, and discuss why indel (but not SNP) rates appear to be constrained at some regulatory sites.
Collapse
Affiliation(s)
- Taane G Clark
- Department of Epidemiology and Public Health, Imperial College, Norfolk Place, London, W2 1PG, UK
| | - Toby Andrew
- Department of Epidemiology and Public Health, Imperial College, Norfolk Place, London, W2 1PG, UK
| | - Gregory M Cooper
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Elliott H Margulies
- National Human Genome Research Institute, National Institutes of Health, 9000 Rockville Pike, Bethesda, Maryland 20892, USA
| | - James C Mullikin
- National Human Genome Research Institute, National Institutes of Health, 9000 Rockville Pike, Bethesda, Maryland 20892, USA
| | - David J Balding
- Department of Epidemiology and Public Health, Imperial College, Norfolk Place, London, W2 1PG, UK
| |
Collapse
|
50
|
Abstract
The problem of multiple testing is an important aspect of genome-wide association studies, and will become more important as marker densities increase. The problem has been tackled with permutation and false discovery rate procedures and with Bayes factors, but each approach faces difficulties that we briefly review. In the current context of multiple studies on different genotyping platforms, we argue for the use of truly genome-wide significance thresholds, based on all polymorphisms whether or not typed in the study. We approximate genome-wide significance thresholds in contemporary West African, East Asian and European populations by simulating sequence data, based on all polymorphisms as well as for a range of single nucleotide polymorphism (SNP) selection criteria. Overall we find that significance thresholds vary by a factor of >20 over the SNP selection criteria and statistical tests that we consider and can be highly dependent on sample size. We compare our results for sequence data to those derived by the HapMap Consortium and find notable differences which may be due to the small sample sizes used in the HapMap estimate.
Collapse
Affiliation(s)
- Clive J Hoggart
- Department of Epidemiology and Public Health, Imperial College London, Norfolk Place, London, UK.
| | | | | | | | | |
Collapse
|