1
|
Maude H, Lau W, Maniatis N, Andrew T. New Insights Into Mitochondrial Dysfunction at Disease Susceptibility Loci in the Development of Type 2 Diabetes. Front Endocrinol (Lausanne) 2021; 12:694893. [PMID: 34456865 PMCID: PMC8385132 DOI: 10.3389/fendo.2021.694893] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 07/08/2021] [Indexed: 12/25/2022] Open
Abstract
This study investigated the potential genetic mechanisms which underlie adipose tissue mitochondrial dysfunction in Type 2 diabetes (T2D), by systematically identifying nuclear-encoded mitochondrial genes (NEMGs) among the genes regulated by T2D-associated genetic loci. The target genes of these 'disease loci' were identified by mapping genetic loci associated with both disease and gene expression levels (expression quantitative trait loci, eQTL) using high resolution genetic maps, with independent estimates co-locating to within a small genetic distance. These co-locating signals were defined as T2D-eQTL and the target genes as T2D cis-genes. In total, 763 cis-genes were associated with T2D-eQTL, of which 50 were NEMGs. Independent gene expression datasets for T2D and insulin resistant cases and controls confirmed that the cis-genes and cis-NEMGs were enriched for differential expression in cases, providing independent validation that genetic maps can identify informative functional genes. Two additional results were consistent with a potential role of T2D-eQTL in regulating the 50 identified cis-NEMGs in the context of T2D risk: (1) the 50 cis-NEMGs showed greater differential expression compared to other NEMGs and (2) other NEMGs showed a trend towards significantly decreased expression if their expression levels correlated more highly with the subset of 50 cis-NEMGs. These 50 cis-NEMGs, which are differentially expressed and associated with mapped T2D disease loci, encode proteins acting within key mitochondrial pathways, including some of current therapeutic interest such as the metabolism of branched-chain amino acids, GABA and biotin.
Collapse
Affiliation(s)
- Hannah Maude
- Department of Metabolism, Digestion & Reproduction, Imperial College, London, United Kingdom
| | - Winston Lau
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Nikolas Maniatis
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Toby Andrew
- Department of Metabolism, Digestion & Reproduction, Imperial College, London, United Kingdom
- *Correspondence: Toby Andrew,
| |
Collapse
|
2
|
Bocher O, Génin E. Rare variant association testing in the non-coding genome. Hum Genet 2020; 139:1345-1362. [PMID: 32500240 DOI: 10.1007/s00439-020-02190-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 05/29/2020] [Indexed: 12/25/2022]
Abstract
The development of next-generation sequencing technologies has opened-up some new possibilities to explore the contribution of genetic variants to human diseases and in particular that of rare variants. Statistical methods have been developed to test for association with rare variants that require the definition of testing units and, in these testing units, the selection of qualifying variants to include in the test. In the coding regions of the genome, testing units are usually the different genes and qualifying variants are selected based on their functional effects on the encoded proteins. Extending these tests to the non-coding regions of the genome is challenging. Testing units are difficult to define as the non-coding genome organisation is still rather unknown. Qualifying variants are difficult to select as the functional impact of non-coding variants on gene expression is hard to predict. These difficulties could explain why very few investigators so far have analysed the non-coding parts of their whole genome sequencing data. These non-coding parts yet represent the vast majority of the genome and some studies suggest that they could play a major role in disease susceptibility. In this review, we discuss recent experimental and statistical developments to gain knowledge on the non-coding genome and how this knowledge could be used to include rare non-coding variants in association tests. We describe the few studies that have considered variants from the non-coding genome in association tests and how they managed to define testing units and select qualifying variants.
Collapse
Affiliation(s)
- Ozvan Bocher
- Génétique, Génomique Fonctionnelle Et Biotechnologies, Faculté de Médecine, Univ Brest, Inserm, Inserm UMR1078, Bâtiment E-IBRBS 2ieme étage, 22 avenue Camille Desmoulins, 29238, Brest Cedex 3, France.
| | - Emmanuelle Génin
- Génétique, Génomique Fonctionnelle Et Biotechnologies, Faculté de Médecine, Univ Brest, Inserm, Inserm UMR1078, Bâtiment E-IBRBS 2ieme étage, 22 avenue Camille Desmoulins, 29238, Brest Cedex 3, France.
- CHU Brest, Brest, France.
| |
Collapse
|
3
|
Andrade ACB, Viana JMS, Pereira HD, Pinto VB, Fonseca e Silva F. Linkage disequilibrium and haplotype block patterns in popcorn populations. PLoS One 2019; 14:e0219417. [PMID: 31553737 PMCID: PMC6760792 DOI: 10.1371/journal.pone.0219417] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 09/12/2019] [Indexed: 12/17/2022] Open
Abstract
Linkage disequilibrium (LD) analysis provides information on the evolutionary aspects of populations. Recently, haplotype blocks have been used to increase the power of quantitative trait loci detection in genome-wide association studies and the prediction accuracy of genomic selection. Our objectives were as follows: to compare the degree of LD, LD decay, and LD decay extent in popcorn populations; to characterize the number and length of haplotype blocks in the populations; and to determine whether maize chromosomes also have a pattern of interspaced regions of high and low rates of recombination. We used a biparental population, a synthetic, and a breeding population, genotyped for approximately 75,000 single nucleotide polymorphisms (SNPs). The sample size ranged from 190 to 192 plants. For the whole-genome LD and haplotype block analyses, we assumed a window of 500 kb. To characterize the block and step patterns of LD in the populations, we constructed LD maps by chromosome, defining a cold spot as a chromosome segment including SNPs with the same LDU position. The LD and haplotype block analyses were also performed at the intragenic level, selecting 12 genes related to zein, starch, cellulose, and fatty acid biosynthesis. The populations with the higher and lower frequencies of |D'| values greater than 0.75 were the biparental (65–74%) and the breeding population (26–58%), respectively. There were slight differences between the populations regarding the average distance for SNPs with |D'| values greater than 0.75 (in the range of approximately 207 to 229 kb). The level of LD expressed by the r2 values was low in the populations (0.02, 0.04, and 0.04, on average) but comparable to some non-isolated human populations. The frequency of r2 values greater than 0.75 was lower in the biparental population (0.2–0.5%) and higher in the other populations (0.2–1.6%). The average distance for SNPs with r2 values greater than 0.75 was much higher in the biparental population (approximately 80 to 126 kb). In the other populations, the ranges were approximately 6 to 19 and 6 to 35 kb. The heatmaps for the regions covered by the first 100 SNPs in each chromosome, in each population (1 to 3.3 Mb, approximately), provided evidence that the comparatively few high r2 values (close to 1.0) occurred only for SNPs in close proximity, especially in the synthetic and breeding populations. Due to the reduced number of SNPs in the haplotype blocks (2 to 3) in the populations, it is not expected advantage of a haplotype-based association study as well as genomic selection along generations. The results concerning LD decay (rapid decay after 5–10 kb) and LD decay extent (along up to 300 kb) are in the range observed with maize inbred line panels. The LD maps indicate that maize chromosomes had a pattern of regions of extensive LD interspaced with regions of low LD. However, our simulated LD map provides evidence that this pattern can reflect regions with differences in allele frequencies and LD levels (expressed by |D'|) and not regions with high and low rates of recombination.
Collapse
Affiliation(s)
| | | | | | - Vitor Batista Pinto
- Federal University of Viçosa, Department of General Biology, Viçosa, MG, Brazil
| | | |
Collapse
|
4
|
Berihulay H, Islam R, Jiang L, Ma Y. Genome-Wide Linkage Disequilibrium and the Extent of Effective Population Sizes in Six Chinese Goat Populations Using a 50K Single Nucleotide Polymorphism Panel. Animals (Basel) 2019; 9:ani9060350. [PMID: 31200540 PMCID: PMC6617254 DOI: 10.3390/ani9060350] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 05/20/2019] [Accepted: 05/30/2019] [Indexed: 12/25/2022] Open
Abstract
Simple Summary Information on linkage disequilibrium (LD) and the extent of effective population size (Ne) has important implications for exploring the degree of biological diversity, for predicting underlying selection pressure, and for designing animal breeding programs. In this study, we assessed LD, Ne, and the distribution of minor allele frequency in six goat populations. Accordingly, the results of LD and Ne using a single nucleotide polymorphism (SNP) panel (Caprine SNP 50K BeadChip, Lincoln, NE, USA) are helpful for the sustainable conservation, proper management, and utilization of Chinese goat populations. Abstract Genome-wide linkage disequilibrium is a useful parameter to study quantitative trait locus (QTL) mapping and genetic selection. In many genomic methodologies, effective population size is an important genetic parameter because of its relationship to the loss of genetic variation, increases in inbreeding, the accumulation of mutations, and the effectiveness of selection. In this study, a total of 193 individuals were genotyped to assess the extent of LD and Ne in six Chinese goat populations using the SNP 50K BeadChip. Across the determined autosomal chromosomes, we found an average of 0.02 and 0.23 for r2 and D’ values, respectively. The average r2 between all the populations varied little and ranged from 0.055 r2 for the Jining Grey to 0.128 r2 for the Guangfeng, with an overall mean of 0.083. Across the 29 autosomal chromosomes, minor allele frequency (MAF) was highest on chromosome 1 (0.321) and lowest on chromosome 25 (0.309), with an average MAF of 0.317, and showing the lowest (25.5% for Louping) and highest (28.8% for Qingeda) SNP proportions at MAF values > 0.3. The inbreeding coefficient ranged from 0.064 to 0.085, with a mean of 0.075 for all the autosomes. The Jining Grey and Qingeda populations showed higher Ne estimates, highlighting that these animals could have been influenced by artificial selection. Furthermore, a declining recent Ne was distinguished for the Arbas Cashmere and Guangfeng populations, and their estimated values were closer to 64 and 95, respectively, 13 generations ago, which indicates that these breeds were exposed to strong selection. This study provides an insight into valuable genetic information and will open up the opportunity for further genomic selection analysis of Chinese goat populations.
Collapse
Affiliation(s)
- Haile Berihulay
- Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China.
| | - Rabiul Islam
- Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China.
| | - Lin Jiang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China.
| | - Yuehui Ma
- Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China.
| |
Collapse
|
5
|
Vergara-Lope A, Ennis S, Vorechovsky I, Pengelly RJ, Collins A. Heterogeneity in the extent of linkage disequilibrium among exonic, intronic, non-coding RNA and intergenic chromosome regions. Eur J Hum Genet 2019; 27:1436-1444. [PMID: 31053778 DOI: 10.1038/s41431-019-0419-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 03/04/2019] [Accepted: 04/16/2019] [Indexed: 11/09/2022] Open
Abstract
Whole-genome sequence data enable construction of high-resolution linkage disequilibrium (LD) maps revealing the LD structure of functional elements within genic and subgenic sequences. The Malecot-Morton model defines LD map distances in linkage disequilibrium units (LDUs), analogous to the centimorgan scale of linkage maps. For whole-genome sequence-derived LD maps, we introduce the ratio of corresponding map lengths kilobases/LDU to describe the extent of LD within genome components. The extent of LD is highly variable across the genome ranging from ~38 kb for intergenic sequences to ~858 kb for centromeric regions. LD is ~16% more extensive in genic, compared with intergenic sequences, reflecting relatively increased selection and/or reduced recombination in genes. The LD profile across 18,268 autosomal genes reveals reduced extent of LD, consistent with elevated recombination, in exonic regions near the 5' end of genes but more extensive LD, compared with intronic sequences, across more centrally located exons. Genes classified as essential and genes linked to Mendelian phenotypes show more extensive LD compared with genes associated with complex traits, perhaps reflecting differences in selective pressure. Significant differences between exonic, intronic and intergenic components demonstrate that fine-scale LD structure provides important insights into genome function, which cannot be revealed by LD analysis of much lower resolution array-based genotyping and conventional linkage maps.
Collapse
Affiliation(s)
- Alejandra Vergara-Lope
- Human Genetics, Faculty of Medicine, University of Southampton, Duthie Building (808), Southampton General Hospital, Tremona Road, Southampton, SO16 6YD, UK
| | - Sarah Ennis
- Human Genetics, Faculty of Medicine, University of Southampton, Duthie Building (808), Southampton General Hospital, Tremona Road, Southampton, SO16 6YD, UK
| | - Igor Vorechovsky
- Human Genetics, Faculty of Medicine, University of Southampton, Duthie Building (808), Southampton General Hospital, Tremona Road, Southampton, SO16 6YD, UK
| | - Reuben J Pengelly
- Human Genetics, Faculty of Medicine, University of Southampton, Duthie Building (808), Southampton General Hospital, Tremona Road, Southampton, SO16 6YD, UK
| | - Andrew Collins
- Human Genetics, Faculty of Medicine, University of Southampton, Duthie Building (808), Southampton General Hospital, Tremona Road, Southampton, SO16 6YD, UK.
| |
Collapse
|
6
|
Sherman SL, Rao D, Keats BJ, Yee S, Spence MA, Hassold TJ, Chakravarti A, Elston RC, Crolla JA, Ennis S, Risch N. Newton E. Morton (1929-2018). Am J Hum Genet 2018; 102:1011-1017. [PMID: 33220219 DOI: 10.1016/j.ajhg.2018.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 05/15/2018] [Indexed: 10/14/2022] Open
|
7
|
Lau W, Andrew T, Maniatis N. High-Resolution Genetic Maps Identify Multiple Type 2 Diabetes Loci at Regulatory Hotspots in African Americans and Europeans. Am J Hum Genet 2017; 100:803-816. [PMID: 28475862 DOI: 10.1016/j.ajhg.2017.04.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 04/11/2017] [Indexed: 10/19/2022] Open
Abstract
Interpretation of results from genome-wide association studies for T2D is challenging. Only very few loci have been replicated in African ancestry populations and the identification of the implicated functional genes remain largely undefined. We used genetic maps that capture detailed linkage disequilibrium information in European and African Americans and applied these to large T2D case-control samples in order to estimate locations for putative functional variants in both populations. Replicated T2D locations were tested for evidence of being regulatory hotspots using adipose expression. We validated a sample of our co-location intervals using next generation sequencing and functional annotation, including enhancers, transcription, and chromatin modifications. We identified 111 additional disease-susceptibility locations, 93 of which are cosmopolitan and 18 of which are European specific. We show that many previously known signals are also risk loci in African Americans. The majority of the disease locations appear to confer risk of T2D via the regulation of expression levels for a large number (266) of cis-regulated genes, the majority of which are not the nearest genes to the disease loci. Sequencing three cosmopolitan locations provided candidate functional variants that precisely co-locate with cell-specific chromatin domains and pancreatic islet enhancers. These variants have large effect sizes and are common across populations. Results show that disease-associated loci in different populations, gene expression, and cell-specific regulatory annotation can be effectively integrated by localizing these effects on high-resolution genetic maps. The cis-regulated genes provide insights into the complex molecular pathways involved and can be used as targets for sequencing and functional molecular studies.
Collapse
|
8
|
Pengelly RJ, Gheyas AA, Kuo R, Mossotto E, Seaby EG, Burt DW, Ennis S, Collins A. Commercial chicken breeds exhibit highly divergent patterns of linkage disequilibrium. Heredity (Edinb) 2016; 117:375-382. [PMID: 27381324 DOI: 10.1038/hdy.2016.47] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Revised: 05/10/2016] [Accepted: 05/19/2016] [Indexed: 02/06/2023] Open
Abstract
The analysis of linkage disequilibrium (LD) underpins the development of effective genotyping technologies, trait mapping and understanding of biological mechanisms such as those driving recombination and the impact of selection. We apply the Malécot-Morton model of LD to create additive LD maps that describe the high-resolution LD landscape of commercial chickens. We investigated LD in chickens (Gallus gallus) at the highest resolution to date for broiler, white egg and brown egg layer commercial lines. There is minimal concordance between breeds of fine-scale LD patterns (correlation coefficient <0.21), and even between discrete broiler lines. Regions of LD breakdown, which may align with recombination hot spots, are enriched near CpG islands and transcription start sites (P<2.2 × 10-16), consistent with recent evidence described in finches, but concordance in hot spot locations between commercial breeds is only marginally greater than random. As in other birds, functional elements in the chicken genome are associated with recombination but, unlike evidence from other bird species, the LD landscape is not stable in the populations studied. The development of optimal genotyping panels for genome-led selection programmes will depend on careful analysis of the LD structure of each line of interest. Further study is required to fully elucidate the mechanisms underlying highly divergent LD patterns found in commercial chickens.
Collapse
Affiliation(s)
- R J Pengelly
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - A A Gheyas
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - R Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - E Mossotto
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - E G Seaby
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - D W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - S Ennis
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - A Collins
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| |
Collapse
|
9
|
Pengelly RJ, Tapper W, Gibson J, Knut M, Tearle R, Collins A, Ennis S. Whole genome sequences are required to fully resolve the linkage disequilibrium structure of human populations. BMC Genomics 2015; 16:666. [PMID: 26335686 PMCID: PMC4558963 DOI: 10.1186/s12864-015-1854-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 08/17/2015] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND An understanding of linkage disequilibrium (LD) structures in the human genome underpins much of medical genetics and provides a basis for disease gene mapping and investigating biological mechanisms such as recombination and selection. Whole genome sequencing (WGS) provides the opportunity to determine LD structures at maximal resolution. RESULTS We compare LD maps constructed from WGS data with LD maps produced from the array-based HapMap dataset, for representative European and African populations. WGS provides up to 5.7-fold greater SNP density than array-based data and achieves much greater resolution of LD structure, allowing for identification of up to 2.8-fold more regions of intense recombination. The absence of ascertainment bias in variant genotyping improves the population representativeness of the WGS maps, and highlights the extent of uncaptured variation using array genotyping methodologies. The complete capture of LD patterns using WGS allows for higher genome-wide association study (GWAS) power compared to array-based GWAS, with WGS also allowing for the analysis of rare variation. The impact of marker ascertainment issues in arrays has been greatest for Sub-Saharan African populations where larger sample sizes and substantially higher marker densities are required to fully resolve the LD structure. CONCLUSIONS WGS provides the best possible resource for LD mapping due to the maximal marker density and lack of ascertainment bias. WGS LD maps provide a rich resource for medical and population genetics studies. The increasing availability of WGS data for large populations will allow for improved research utilising LD, such as GWAS and recombination biology studies.
Collapse
Affiliation(s)
- Reuben J Pengelly
- Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Tremona Road, Southampton, SO16 6YD, UK.
| | - William Tapper
- Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Tremona Road, Southampton, SO16 6YD, UK.
| | - Jane Gibson
- Centre for Biological Sciences, Faculty of Natural & Environmental Sciences, University of Southampton, Southampton, UK.
| | - Marcin Knut
- Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Tremona Road, Southampton, SO16 6YD, UK.
| | - Rick Tearle
- Complete Genomics, Inc., Mountain View, CA, USA.
| | - Andrew Collins
- Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Tremona Road, Southampton, SO16 6YD, UK.
| | - Sarah Ennis
- Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Tremona Road, Southampton, SO16 6YD, UK.
| |
Collapse
|
10
|
Jeffares DC, Rallis C, Rieux A, Speed D, Převorovský M, Mourier T, Marsellach FX, Iqbal Z, Lau W, Cheng TM, Pracana R, Mülleder M, Lawson JL, Chessel A, Bala S, Hellenthal G, O’Fallon B, Keane T, Simpson JT, Bischof L, Tomiczek B, Bitton DA, Sideri T, Codlin S, Hellberg JE, van Trigt L, Jeffery L, Li JJ, Atkinson S, Thodberg M, Febrer M, McLay K, Drou N, Brown W, Hayles J, Carazo Salas RE, Ralser M, Maniatis N, Balding DJ, Balloux F, Durbin R, Bähler J. The genomic and phenotypic diversity of Schizosaccharomyces pombe. Nat Genet 2015; 47:235-41. [PMID: 25665008 PMCID: PMC4645456 DOI: 10.1038/ng.3215] [Citation(s) in RCA: 124] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 01/14/2015] [Indexed: 12/14/2022]
Abstract
Natural variation within species reveals aspects of genome evolution and function. The fission yeast Schizosaccharomyces pombe is an important model for eukaryotic biology, but researchers typically use one standard laboratory strain. To extend the usefulness of this model, we surveyed the genomic and phenotypic variation in 161 natural isolates. We sequenced the genomes of all strains, finding moderate genetic diversity (π = 3 × 10(-3) substitutions/site) and weak global population structure. We estimate that dispersal of S. pombe began during human antiquity (∼340 BCE), and ancestors of these strains reached the Americas at ∼1623 CE. We quantified 74 traits, finding substantial heritable phenotypic diversity. We conducted 223 genome-wide association studies, with 89 traits showing at least one association. The most significant variant for each trait explained 22% of the phenotypic variance on average, with indels having larger effects than SNPs. This analysis represents a rich resource to examine genotype-phenotype relationships in a tractable model.
Collapse
Affiliation(s)
- Daniel C. Jeffares
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Charalampos Rallis
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Adrien Rieux
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | - Doug Speed
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | - Martin Převorovský
- Department of Cell Biology, Charles University in Prague, Prague, Czech Republic
| | - Tobias Mourier
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | | | - Zamin Iqbal
- Wellcome Trust Centre for Human Genetics, Oxford, UK
| | - Winston Lau
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Tammy M.K. Cheng
- Cell Cycle Laboratory, Cancer Research UK London Research Institute, London, UK
| | - Rodrigo Pracana
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Michael Mülleder
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Jonathan L.D. Lawson
- Department of Genetics, University of Cambridge, Cambridge, UK
- The Gurdon Institute, University of Cambridge, Cambridge, UK
| | - Anatole Chessel
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Sendu Bala
- Wellcome Trust Sanger Institute, Cambridge, UK
| | - Garrett Hellenthal
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | | | | | | | - Leanne Bischof
- CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia; The Genome Analysis Centre, Norwich, UK
| | - Bartlomiej Tomiczek
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Danny A. Bitton
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Theodora Sideri
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Sandra Codlin
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | | | - Laurent van Trigt
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Linda Jeffery
- Cell Cycle Laboratory, Cancer Research UK London Research Institute, London, UK
| | - Juan-Juan Li
- Cell Cycle Laboratory, Cancer Research UK London Research Institute, London, UK
| | - Sophie Atkinson
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - Malte Thodberg
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Melanie Febrer
- CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia; The Genome Analysis Centre, Norwich, UK
| | - Kirsten McLay
- CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia; The Genome Analysis Centre, Norwich, UK
| | - Nizar Drou
- CSIRO Mathematics, Informatics and Statistics, North Ryde, Australia; The Genome Analysis Centre, Norwich, UK
| | - William Brown
- Centre for Genetics and Genomics, The University of Nottingham, Nottingham, UK
| | - Jacqueline Hayles
- Cell Cycle Laboratory, Cancer Research UK London Research Institute, London, UK
| | - Rafael E. Carazo Salas
- Department of Genetics, University of Cambridge, Cambridge, UK
- The Gurdon Institute, University of Cambridge, Cambridge, UK
| | - Markus Ralser
- Department of Biochemistry, University of Cambridge, Cambridge, UK
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
- Division of Physiology and Metabolism, MRC National Institute for Medical Research, London, UK
| | - Nikolas Maniatis
- Department of Genetics, Evolution & Environment, University College London, London, UK
| | - David J. Balding
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | - Francois Balloux
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | | | - Jürg Bähler
- Department of Genetics, Evolution & Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| |
Collapse
|
11
|
Kemppainen P, Knight CG, Sarma DK, Hlaing T, Prakash A, Maung Maung YN, Somboon P, Mahanta J, Walton C. Linkage disequilibrium network analysis (LDna) gives a global view of chromosomal inversions, local adaptation and geographic structure. Mol Ecol Resour 2015; 15:1031-45. [PMID: 25573196 PMCID: PMC4681347 DOI: 10.1111/1755-0998.12369] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Revised: 12/15/2014] [Accepted: 12/29/2014] [Indexed: 12/21/2022]
Abstract
Recent advances in sequencing allow population-genomic data to be generated for virtually any species. However, approaches to analyse such data lag behind the ability to generate it, particularly in nonmodel species. Linkage disequilibrium (LD, the nonrandom association of alleles from different loci) is a highly sensitive indicator of many evolutionary phenomena including chromosomal inversions, local adaptation and geographical structure. Here, we present linkage disequilibrium network analysis (LDna), which accesses information on LD shared between multiple loci genomewide. In LD networks, vertices represent loci, and connections between vertices represent the LD between them. We analysed such networks in two test cases: a new restriction-site-associated DNA sequence (RAD-seq) data set for Anopheles baimaii, a Southeast Asian malaria vector; and a well-characterized single nucleotide polymorphism (SNP) data set from 21 three-spined stickleback individuals. In each case, we readily identified five distinct LD network clusters (single-outlier clusters, SOCs), each comprising many loci connected by high LD. In A. baimaii, further population-genetic analyses supported the inference that each SOC corresponds to a large inversion, consistent with previous cytological studies. For sticklebacks, we inferred that each SOC was associated with a distinct evolutionary phenomenon: two chromosomal inversions, local adaptation, population-demographic history and geographic structure. LDna is thus a useful exploratory tool, able to give a global overview of LD associated with diverse evolutionary phenomena and identify loci potentially involved. LDna does not require a linkage map or reference genome, so it is applicable to any population-genomic data set, making it especially valuable for nonmodel species.
Collapse
Affiliation(s)
- Petri Kemppainen
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK.,Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, Brno, Czech Republic
| | - Christopher G Knight
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK
| | - Devojit K Sarma
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK.,Regional Medical Research Centre, NE (ICMR), Dibrugarh, 786 001, India
| | - Thaung Hlaing
- Department of Medical Research (Lower Myanmar), Medical Entomology Research Division, 5 Ziwaka Road, Dagon P.O., Yangon, 11191, Myanmar
| | - Anil Prakash
- Regional Medical Research Centre, NE (ICMR), Dibrugarh, 786 001, India
| | - Yan Naung Maung Maung
- Department of Medical Research (Lower Myanmar), Medical Entomology Research Division, 5 Ziwaka Road, Dagon P.O., Yangon, 11191, Myanmar
| | - Pradya Somboon
- Department of Parasitology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Jagadish Mahanta
- Regional Medical Research Centre, NE (ICMR), Dibrugarh, 786 001, India
| | - Catherine Walton
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK
| |
Collapse
|
12
|
Direk K, Lau W, Small KS, Maniatis N, Andrew T. ABCC5 transporter is a novel type 2 diabetes susceptibility gene in European and African American populations. Ann Hum Genet 2014; 78:333-44. [PMID: 25117150 PMCID: PMC4173130 DOI: 10.1111/ahg.12072] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Accepted: 04/29/2014] [Indexed: 12/17/2022]
Abstract
Numerous functional studies have implicated PARL in relation to type 2 diabetes (T2D). We hypothesised that conflicting human association studies may be due to neighbouring causal variants being in linkage disequilibrium (LD) with PARL. We conducted a comprehensive candidate gene study of the extended LD genomic region that includes PARL and transporter ABCC5 using three data sets (two European and one African American), in relation to healthy glycaemic variation, visceral fat accumulation and T2D disease. We observed no evidence for previously reported T2D association with Val262Leu or PARL using array and fine-map genomic and expression data. By contrast, we observed strong evidence of T2D association with ABCC5 (intron 26) for European and African American samples (P = 3E-07) and with ABCC5 adipose expression in Europeans [odds ratio (OR) = 3.8, P = 2E-04]. The genomic location estimate for the ABCC5 functional variant, associated with all phenotypes and expression data (P = 1E-11), was identical for all samples (at Chr3q 185,136 kb B36), indicating that the risk variant is an expression quantitative trait locus (eQTL) with increased expression conferring risk of disease. That the association with T2D is observed in populations of disparate ancestry suggests the variant is a ubiquitous risk factor for T2D.
Collapse
Affiliation(s)
- Kenan Direk
- Department of Twin Research and Genetic Epidemiology, King's College London, School of MedicineLondon, UK
| | - Winston Lau
- Department of Genetics, Evolution and Environment, University College LondonLondon, UK
| | - Kerrin S Small
- Department of Twin Research and Genetic Epidemiology, King's College London, School of MedicineLondon, UK
| | - Nikolas Maniatis
- Department of Genetics, Evolution and Environment, University College LondonLondon, UK
| | - Toby Andrew
- Department of Genomics of Common Disease, Imperial CollegeLondon, UK
| |
Collapse
|
13
|
Corbin LJ, Kranis A, Blott SC, Swinburne JE, Vaudin M, Bishop SC, Woolliams JA. The utility of low-density genotyping for imputation in the Thoroughbred horse. Genet Sel Evol 2014; 46:9. [PMID: 24495673 PMCID: PMC3930001 DOI: 10.1186/1297-9686-46-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 12/20/2013] [Indexed: 12/21/2022] Open
Abstract
Background Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem. Results Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money. Conclusions Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming necessary since researchers are starting to use the recently developed equine 70K SNP chip. However, more work is needed to evaluate the impact of between-breed differences on imputation accuracy.
Collapse
Affiliation(s)
| | | | | | | | | | | | - John A Woolliams
- Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK.
| |
Collapse
|
14
|
Yang T, Deng HW, Niu T. Critical assessment of coalescent simulators in modeling recombination hotspots in genomic sequences. BMC Bioinformatics 2014; 15:3. [PMID: 24387001 PMCID: PMC3890628 DOI: 10.1186/1471-2105-15-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Accepted: 12/30/2013] [Indexed: 12/04/2022] Open
Abstract
Background Coalescent simulation is pivotal for understanding population evolutionary models and demographic histories, as well as for developing novel analytical methods for genetic association studies for DNA sequence data. A plethora of coalescent simulators are developed, but selecting the most appropriate program remains challenging. Results We extensively compared performances of five widely used coalescent simulators – Hudson’s ms, msHOT, MaCS, Simcoal2, and fastsimcoal, to provide a practical guide considering three crucial factors, 1) speed, 2) scalability and 3) recombination hotspot position and intensity accuracy. Although ms represents a popular standard coalescent simulator, it lacks the ability to simulate sequences with recombination hotspots. An extended program msHOT has compensated for the deficiency of ms by incorporating recombination hotspots and gene conversion events at arbitrarily chosen locations and intensities, but remains limited in simulating long stretches of DNA sequences. Simcoal2, based on a discrete generation-by-generation approach, could simulate more complex demographic scenarios, but runs comparatively slow. MaCS and fastsimcoal, both built on fast, modified sequential Markov coalescent algorithms to approximate standard coalescent, are much more efficient whilst keeping salient features of msHOT and Simcoal2, respectively. Our simulations demonstrate that they are more advantageous over other programs for a spectrum of evolutionary models. To validate recombination hotspots, LDhat 2.2 rhomap package, sequenceLDhot and Haploview were compared for hotspot detection, and sequenceLDhot exhibited the best performance based on both real and simulated data. Conclusions While ms remains an excellent choice for general coalescent simulations of DNA sequences, MaCS and fastsimcoal are much more scalable and flexible in simulating a variety of demographic events under different recombination hotspot models. Furthermore, sequenceLDhot appears to give the most optimal performance in detecting and validating cross-over hotspots.
Collapse
Affiliation(s)
| | | | - Tianhua Niu
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, 1440 Canal Street, Suite 2001, New Orleans, LA 70112, USA.
| |
Collapse
|
15
|
Söderman J, Norén E, Christiansson M, Bragde H, Thiébaut R, Hugot JP, Tysk C, O’Morain CA, Gassull M, Finkel Y, Colombel JF, Lémann M, Almer S. Analysis of single nucleotide polymorphisms in the region of CLDN2-MORC4 in relation to inflammatory bowel disease. World J Gastroenterol 2013; 19:4935-4943. [PMID: 23946598 PMCID: PMC3740423 DOI: 10.3748/wjg.v19.i30.4935] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Revised: 04/04/2013] [Accepted: 06/06/2013] [Indexed: 02/06/2023] Open
Abstract
AIM: To investigate a possible genetic influence of claudin (CLDN)1, CLDN2 and CLDN4 in the etiology of inflammatory bowel disease.
METHODS: Allelic association between genetic regions of CLDN1, CLDN2 or CLDN4 and patients with inflammatory bowel disease, Crohn’s disease (CD) or ulcerative colitis were investigated using both a case-control study approach (one case randomly selected from each of 191 Swedish inflammatory bowel disease families and 333 controls) and a family-based study (463 non-Swedish European inflammatory bowel disease -families). A nonsynonymous coding single nucleotide polymorphism in MORC4, located on the same linkage block as CLDN2, was investigated for association, as were two novel CLDN2 single nucleotide polymorphism markers, identified by resequencing.
RESULTS: A single nucleotide polymorphism marker (rs12014762) located in the genetic region of CLDN2 was significantly associated to CD (case-control allelic OR = 1.98, 95%CI: 1.17-3.35, P = 0.007). MORC4 was present on the same linkage block as this CD marker. Using the case-control approach, a significant association (case control allelic OR = 1.61, 95%CI: 1.08-2.41, P = 0.018) was found between CD and a nonsynonymous coding single nucleotide polymorphism (rs6622126) in MORC4. The association between the CLDN2 marker and CD was not replicated in the family-based study. Ulcerative colitis was not associated to any of the single nucleotide polymorphism markers.
CONCLUSION: These findings suggest that a variant of the CLDN2-MORC4 region predisposes to CD in a Swedish population.
Collapse
|
16
|
Similarity in recombination rate and linkage disequilibrium at CYP2C and CYP2D cytochrome P450 gene regions among Europeans indicates signs of selection and no advantage of using tagSNPs in population isolates. Pharmacogenet Genomics 2013; 22:846-57. [PMID: 23089684 DOI: 10.1097/fpc.0b013e32835a3a6d] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
OBJECTIVE Linkage disequilibrium (LD) and recombination rate variations are known to vary considerably between human genome regions and populations mostly because of the combined effects of mutation, recombination, and demographic history. Thus, the pattern of LD is a key issue to disentangle variants associated with complex traits. Here, we aim to describe the haplotype structure and LD variation at the pharmacogenetically relevant cytochrome P450 CYP2C and CYP2D gene regions among European populations. METHODS To assess the haplotype structure, LD pattern, and recombination rate variations in the clinically significant CYP2C and CYP2D regions, we genotyped 143 single-nucleotide polymorphisms (SNPs) across these two genome regions in a diverse set of 11 European population samples and one sub-Saharan African sample. RESULTS Our results showed extended patterns of LD and in general a low rate of recombination at these loci, and a low degree of allele differentiation for the two cytochrome P450 regions among Europeans, with the exception of the Sami and the Finns as European outliers. The Sami sample showed reduced haplotype diversity and higher LD for the two cytochrome P450 regions than the other Europeans, a feature that is proposed to enhance the LD mapping of underlying common complex traits. However, recombination hotspots and LD blocks at these two regions showed highly consistent structures across Europeans including Finns and Sami. Moreover, we showed that the CEPH sample has significantly higher tag transferability among Europeans and a more efficient tagging of both the rare CYP2C9 and the common CYP2C19 functional variants than the Sami. Our data set included CYP2C9*3 (rs1057910) and CYP2C19*2 (rs4244285) enzyme activity-altering variants associated in a recent genome-wide study with acenocoumarol-induced and warfarin-induced anticoagulation or to the antiplatelet effect of clopidogrel, respectively. Including these known activity-altering variants, we showed the haplotype variation and high derived allele frequencies of novel recently identified acenocoumarol genome-wide associated SNPs at CYP2C9 (rs4086116) and CYP2C18 (rs12772169, rs1998591, rs2104543, rs1042194) loci in a comprehensive set of 11 European populations. Furthermore, a significant frequency difference of a CYP2C19*2 gene mutation causing variable drug reactions was observed among Europeans. CONCLUSION The CEPH sample representing the general European population as such in the HapMap project seems to be the optimal population sample for the LD mapping of common complex traits among Europeans. Nevertheless, it is still argued that the unique pattern of LD in the Sami may offer an advantage for further association mapping, especially if multiple rare variants play a role in disease etiology. However, besides the activity-altering CYP2C9*3 (rs1057910) and CYP2C19*2 (rs4244285) variants, the high derived allele frequencies of novel recently identified acenocoumarol genome-wide associated SNPs at CYP2C9 (rs4086116) and CYP2C18 (rs12772169, rs1998591, rs2104543, rs1042194) loci variants indicated that the CYP2C region may have been influenced by selection. Thus, this fine-scale haplotype map of the CYP2C and CYP2D regions may help to choose markers for further association mapping of complex pharmacogenetic traits at these loci.
Collapse
|
17
|
Lam TH, Shen M, Chia JM, Chan SH, Ren EC. Population-specific recombination sites within the human MHC region. Heredity (Edinb) 2013; 111:131-8. [PMID: 23715014 DOI: 10.1038/hdy.2013.27] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2012] [Revised: 01/22/2013] [Accepted: 03/06/2013] [Indexed: 01/04/2023] Open
Abstract
Genetic rearrangement by recombination is one of the major driving forces for genome evolution, and recombination is known to occur in non-random, discreet recombination sites within the genome. Mapping of recombination sites has proved to be difficult, particularly, in the human MHC region that is complicated by both population variation and highly polymorphic HLA genes. To overcome these problems, HLA-typed individuals from three representative populations: Asian, European and African were used to generate phased HLA haplotypes. Extended haplotype homozygosity (EHH) plots constructed from the phased haplotype data revealed discreet EHH drops corresponding to recombination events and these signatures were observed to be different for each population. Surprisingly, the majority of recombination sites detected are unique to each population, rather than being common. Unique recombination sites account for 56.8% (21/37 of total sites) in the Asian cohort, 50.0% (15/30 sites) in Europeans and 63.2% (24/38 sites) in Africans. Validation carried out at a known sperm typing recombination site of 45 kb (HLA-F-telomeric) showed that EHH was an efficient method to narrow the recombination region to 826 bp, and this was further refined to 660 bp by resequencing. This approach significantly enhanced mapping of the genomic architecture within the human MHC, and will be useful in studies to identify disease risk genes.
Collapse
Affiliation(s)
- T H Lam
- Singapore Immunology Network, A*STAR, Singapore
| | | | | | | | | |
Collapse
|
18
|
Elding H, Lau W, Swallow D, Maniatis N. Refinement in localization and identification of gene regions associated with Crohn disease. Am J Hum Genet 2013; 92:107-13. [PMID: 23246291 PMCID: PMC3542460 DOI: 10.1016/j.ajhg.2012.11.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 08/03/2012] [Accepted: 11/05/2012] [Indexed: 12/13/2022] Open
Abstract
The risk of Crohn disease (CD) has a large genetic component. A recent meta-analysis of 6 genome-wide association studies reported 71 chromosomal intervals but does not account for all of the known genetic contribution. Here, we refine localization of the previously reported intervals and also identify additional CD susceptibility genes using a mapping approach that localizes causal variants based on genetic maps in linkage disequilibrium units (LDU maps). Using 2 of the 6 cohorts, 66 of the 71 previously reported loci are confirmed and more precise location estimates for these intervals are given. We identify 78 additional gene regions that pass genome-wide significance, providing strong evidence for 144 genes. Additionally, 56 nominally significant signals, but with more stringent and precise colocalization, are identified. In total, we provide evidence for 200 gene regions confirming that CD is truly multifactorial and complex in nature. Many identified genes have functions that are compatible with involvement in immune/inflammatory processes and seem to have a large effect in individuals with extra ileal as well as ileal inflammation. The precise locations and the evidence that some genes reflect phenotypic subgroups will help identify functional variants and will lead to greater insight of CD etiology.
Collapse
Affiliation(s)
- Heather Elding
- Research Department of Genetics Evolution & Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Winston Lau
- Research Department of Genetics Evolution & Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Dallas M. Swallow
- Research Department of Genetics Evolution & Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Nikolas Maniatis
- Research Department of Genetics Evolution & Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
19
|
Gibson J, Tapper W, Ennis S, Collins A. Exome-based linkage disequilibrium maps of individual genes: functional clustering and relationship to disease. Hum Genet 2012; 132:233-43. [PMID: 23124193 DOI: 10.1007/s00439-012-1243-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2012] [Accepted: 10/20/2012] [Indexed: 11/26/2022]
Abstract
Exome sequencing identifies thousands of DNA variants and a proportion of these are involved in disease. Genotypes derived from exome sequences provide particularly high-resolution coverage enabling study of the linkage disequilibrium structure of individual genes. The extent and strength of linkage disequilibrium reflects the combined influences of mutation, recombination, selection and population history. By constructing linkage disequilibrium maps of individual genes, we show that genes containing OMIM-listed disease variants are significantly under-represented amongst genes with complete or very strong linkage disequilibrium (P = 0.0004). In contrast, genes with disease variants are significantly over-represented amongst genes with levels of linkage disequilibrium close to the average for genes not known to contain disease variants (P = 0.0038). Functional clustering reveals, amongst genes with particularly strong linkage disequilibrium, significant enrichment of essential biological functions (e.g. phosphorylation, cell division, cellular transport and metabolic processes). Strong linkage disequilibrium, corresponding to reduced haplotype diversity, may reflect selection in utero against deleterious mutations which have profound impact on the function of essential genes. Genes with very weak linkage disequilibrium show enrichment of functions requiring greater allelic diversity (e.g. sensory perception and immune response). This category is not enriched for genes containing disease variation. In contrast, there is significant enrichment of genes containing disease variants amongst genes with more average levels of linkage disequilibrium. Mutations in these genes may less likely lead to in utero lethality and be subject to less intense selection.
Collapse
Affiliation(s)
- Jane Gibson
- Genetic Epidemiology and Genomic informatics Group, Human Genetics, University of Southampton, Southampton General Hospital, Southampton, UK
| | | | | | | |
Collapse
|
20
|
Burkhardt J, Kirsten H, Wolfram G, Quente E, Ahnert P. Differential allelic expression of IL13 and CSF2 genes associated with asthma. Genet Mol Biol 2012; 35:567-74. [PMID: 23055793 PMCID: PMC3459404 DOI: 10.1590/s1415-47572012005000055] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2012] [Accepted: 05/11/2012] [Indexed: 11/21/2022] Open
Abstract
An important area of genetic research is the identification of functional mechanisms in polymorphisms associated with diseases. A highly relevant functional mechanism is the influence of polymorphisms on gene expression levels (differential allelic expression, DAE). The coding single nucleotide polymorphisms (SNPs) CSF2rs25882 and IL13rs20541 have been associated with asthma. In this work, we investigated whether the mRNA expression levels of CSF2 or IL13 were correlated with these SNPs. Samples were analyzed by mass spectrometry-based quantification of gene expression. Both SNPs influenced gene expression levels (CSF2rs25882: poverall = 0.008 and pDAE samples = 0.00006; IL13rs20541: poverall = 0.059 and pDAE samples = 0.036). For CSF2, the expression level was increased by 27.4% (95% CI: 18.5%–35.4%) in samples with significant DAE in the presence of one copy of risk variant CSF2rs25882-T. The average expression level of IL13 was increased by 29.8% (95% CI: 3.1%–63.4%) in samples with significant DAE in the presence of one copy of risk variant IL13rs20541-A. Enhanced expression of CSF2 could stimulate macrophages and neutrophils during inflammation and may be related to the etiology of asthma. For IL-13, higher expression could enhance the functional activity of the asthma-associated isoform. Overall, the analysis of DAE provides an efficient approach for identifying possible functional mechanisms that link disease-associated variants with altered gene expression levels.
Collapse
Affiliation(s)
- Jana Burkhardt
- Translational Centre for Regenerative Medicine, Universität Leipzig, Leipzig,Germany. ; Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
| | | | | | | | | |
Collapse
|
21
|
Abstract
Maximum likelihood methods for the estimation of linkage disequilibrium between biallelic DNA-markers in half-sib families (half-sib method) are developed for single and multifamily situations. Monte Carlo computer simulations were carried out for a variety of scenarios regarding sire genotypes, linkage disequilibrium, recombination fraction, family size, and number of families. A double heterozygote sire was simulated with recombination fraction of 0.00, linkage disequilibrium among dams of δ=0.10, and alleles at both markers segregating at intermediate frequencies for a family size of 500. The average estimates of δ were 0.17, 0.25, and 0.10 for Excoffier and Slatkin (1995), maternal informative haplotypes, and the half-sib method, respectively. A multifamily EM algorithm was tested at intermediate frequencies by computer simulation. The range of the absolute difference between estimated and simulated δ was between 0.000 and 0.008. A cattle half-sib family was genotyped with the Illumina 50K BeadChip. There were 314,730 SNP pairs for which the sire was a homo-heterozygote with average estimates of r2 of 0.115, 0.067, and 0.111 for half-sib, Excoffier and Slatkin (1995), and maternal informative haplotypes methods, respectively. There were 208,872 SNP pairs for which the sire was double heterozygote with average estimates of r2 across the genome of 0.100, 0.267, and 0.925 for half-sib, Excoffier and Slatkin (1995), and maternal informative haplotypes methods, respectively. Genome analyses for all possible sire genotypes with 829,042 tests showed that ignoring half-sib family structure leads to upward biased estimates of linkage disequilibrium. Published inferences on population structure and evolution of cattle should be revisited after accommodating existing half-sib family structure in the estimation of linkage disequilibrium.
Collapse
|
22
|
Sarbajna S, Denniff M, Jeffreys AJ, Neumann R, Soler Artigas M, Veselis A, May CA. A major recombination hotspot in the XqYq pseudoautosomal region gives new insight into processing of human gene conversion events. Hum Mol Genet 2012; 21:2029-38. [PMID: 22291443 DOI: 10.1093/hmg/dds019] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Recombination plays a fundamental role in meiosis. Non-exchange gene conversion (non-crossover, NCO) may facilitate homologue pairing, while reciprocal crossover (CO) physically connects homologues so they orientate appropriately on the meiotic spindle. In males, X-Y homologous pairing and exchange occurs within the two pseudoautosomal regions (PARs) together comprising <5% of the human sex chromosomes. Successful meiosis depends on an obligatory CO within PAR1, while the nature and role of exchange within PAR2 is unclear. Here, we describe the identification and characterization of a typical ~1 kb wide recombination hotspot within PAR2. We find that both COs and NCOs are strongly modulated in trans by the presumed chromatin remodelling protein PRDM9, and in cis by a single nucleotide polymorphism (SNP) located at the hotspot centre that appears to influence recombination initiation and which causes biased gene conversion in SNP heterozygotes. This, the largest survey to date of human NCOs reveals for the first time substantial inter-individual variation in the NCO:CO ratio. Although the extent of biased transmission at the central marker in COs is similar across men, it is highly variable among NCO recombinants. This suggests that cis-effects are mediated not only through recombination initiation frequencies varying between haplotypes but also through subsequent processing, with the potential to significantly intensify meiotic drive of hotspot-suppressing alleles. The NCO:CO ratio and extent of transmission distortion among NCOs appear to be inter-related, suggesting the existence of two NCO pathways in humans.
Collapse
|
23
|
Elding H, Lau W, Swallow D, Maniatis N. Dissecting the genetics of complex inheritance: linkage disequilibrium mapping provides insight into Crohn disease. Am J Hum Genet 2011; 89:798-805. [PMID: 22152681 PMCID: PMC3234369 DOI: 10.1016/j.ajhg.2011.11.006] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Revised: 10/24/2011] [Accepted: 11/08/2011] [Indexed: 12/21/2022] Open
Abstract
Family studies for Crohn disease (CD) report extensive linkage on chromosome 16q and pinpoint NOD2 as a possible causative locus. However, linkage is also observed in families that do not bear the most frequent NOD2 causative mutations, but no other signals on 16q have been found so far in published genome-wide association studies. Our aim is to identify this missing genetic contribution. We apply a powerful genetic mapping approach to the Wellcome Trust Case-Control Consortium and the National Institute of Diabetes and Digestive and Kidney Diseases genome-wide association data on CD. This method takes into account the underlying structure of linkage disequilibrium (LD) by using genetic distances from LD maps and provides a location for the causal agent. We find genetic heterogeneity within the NOD2 locus and also show an independent and unsuspected involvement of the neighboring gene, CYLD. We find associations with the IRF8 region and the region containing CDH1 and CDH3, as well as substantial phenotypic and genetic heterogeneity for CD itself. The genes are known to be involved in inflammation and immune dysregulation. These findings provide insight into the genetics of CD and suggest promising directions for understanding disease heterogeneity. The application of this method thus paves the way for understanding complex inheritance in general, leading to the dissection of different pathways and ultimately, personalized treatment.
Collapse
|
24
|
Variants of the protein PRDM9 differentially regulate a set of human meiotic recombination hotspots highly active in African populations. Proc Natl Acad Sci U S A 2011; 108:12378-83. [PMID: 21750151 DOI: 10.1073/pnas.1109531108] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
PRDM9 is a major specifier of human meiotic recombination hotspots, probably via binding of its zinc-finger repeat array to a DNA sequence motif associated with hotspots. However, our view of PRDM9 regulation, in terms of motifs defined and hotspots studied, has a strong bias toward the PRDM9 A variant particularly common in Europeans. We show that population diversity can reveal a second class of hotspots specifically activated by PRDM9 variants common in Africans but rare in Europeans. These African-enhanced hotspots nevertheless share very similar properties with their counterparts activated by the A variant. The specificity of hotspot activation is such that individuals with differing PRDM9 genotypes, even within the same population, can use substantially if not completely different sets of hotspots. Each African-enhanced hotspot is activated by a distinct spectrum of PRDM9 variants, despite the fact that all are predicted to bind the same sequence motif. This differential activation points to complex interactions between the zinc-finger array and hotspots and identifies features of the array that might be important in controlling hotspot activity.
Collapse
|
25
|
Politopoulos I, Gibson J, Tapper W, Ennis S, Eccles D, Collins A. Composite likelihood-based meta-analysis of breast cancer association studies. J Hum Genet 2011; 56:377-82. [DOI: 10.1038/jhg.2011.23] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
26
|
Collins A, Tapper WJ. Genome variation: a review of Web resources. Methods Mol Biol 2011; 713:129-139. [PMID: 21153616 DOI: 10.1007/978-1-60327-416-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
An enormous number of high-quality Web-based resources are now available to facilitate research into genome variation. Although identification of the most appropriate and informative resources can be challenging, a number of key sites provide links to more specialized resources that may be useful to follow up. Given ongoing research, focussing on the sequencing of many different genomes, we can expect sequence databases and their associated polymorphism-based resources to greatly increase in depth and complexity in a relatively short period of time. However, databases and tools developed to date, and described here, provide a sound basis for accommodating this next generation of genomic data. As well as sequence-oriented resources this review presents databases providing genotypic and common disease phenotype data, copy number variation, genetic maps, cytogenetic data, and gives an overview of key software tools, with the emphasis on analysis of the genetic basis of common disease.
Collapse
Affiliation(s)
- Andrew Collins
- Human Genetics Research Division, University of Southampton, Southampton, UK
| | | |
Collapse
|
27
|
Abstract
Motivation: Understanding the patterns of association between polymorphisms at different loci in a population (linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging. Results: We present a more practical method to build GM that describe LD. The method is based on learning weighted Bayesian network structures from haplotype data, extracting equivalence structure classes and using them to model LD. The results obtained in public data from the HapMap database showed that the method is a promising tool for modeling LD. The associations represented by the learned models are correlated with the traditional measure of LD D′. The method was able to represent LD blocks found by standard tools. The granularity of the association blocks and the readability of the models can be controlled in the method. The results suggest that the causality information gained by our method can be useful to tell about the conservability of the genetic markers and to guide the selection of subset of representative markers. Availability: The implementation of the method is available upon request by email. Contact:maciel@sc.usp.br
Collapse
Affiliation(s)
- Edwin Villanueva
- Electrical Engineering Department, Sao Carlos School of Engineering, University of Sao Paulo, Sao Carlos, Sao Paulo, Brazil
| | | |
Collapse
|
28
|
Politopoulos I, Gibson J, Tapper W, Ennis S, Eccles D, Collins A. Genome-wide association of breast cancer: composite likelihood with imputed genotypes. Eur J Hum Genet 2010; 19:194-9. [PMID: 20959865 DOI: 10.1038/ejhg.2010.157] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
We describe composite likelihood-based analysis of a genome-wide breast cancer case-control sample from the Cancer Genetic Markers of Susceptibility project. We determine 14 380 genome regions of fixed size on a linkage disequilibrium (LD) map, which delimit comparable levels of LD. Although the numbers of single-nucleotide polymorphisms (SNPs) are highly variable, each region contains an average of ∼35 SNPs and an average of ∼69 after imputation of missing genotypes. Composite likelihood association mapping yields a single P-value for each region, established by a permutation test, along with a maximum likelihood disease location, SE and information weight. For single SNP analysis, the nominal P-value for the most significant SNP (msSNP) requires substantial correction given the number of SNPs in the region. Therefore, imputing genotypes may not always be advantageous for the msSNP test, in contrast to composite likelihood. For the region containing FGFR2 (a known breast cancer gene) the largest χ(2) is obtained under composite likelihood with imputed genotypes (χ(2)(2) increases from 20.6 to 22.7), and compares with a single SNP-based χ(2)(2) of 19.9 after correction. Imputation of additional genotypes in this region reduces the size of the 95% confidence interval for location of the disease gene by ∼40%. Among the highest ranked regions, SNPs in the NTSR1 gene would be worthy of examination in additional samples. Meta-analysis, which combines weighted evidence from composite likelihood in different samples, and refines putative disease locations, is facilitated through defining fixed regions on an underlying LD map.
Collapse
Affiliation(s)
- Ioannis Politopoulos
- Genetic Epidemiology and Bioinformatics Research Group, Human Genetics Research Division, University of Southampton, School of Medicine, Southampton General Hospital, Hants, UK
| | | | | | | | | | | |
Collapse
|
29
|
Population structure and genome-wide patterns of variation in Ireland and Britain. Eur J Hum Genet 2010; 18:1248-54. [PMID: 20571510 DOI: 10.1038/ejhg.2010.87] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Located off the northwestern coast of the European mainland, Britain and Ireland were among the last regions of Europe to be colonized by modern humans after the last glacial maximum. Further, the geographical location of Britain, and in particular of Ireland, is such that the impact of historical migration has been minimal. Genetic diversity studies applying the Y chromosome and mitochondrial systems have indicated reduced diversity and an increased population structure across Britain and Ireland relative to the European mainland. Such characteristics would have implications for genetic mapping studies of complex disease. We set out to further our understanding of the genetic architecture of the region from the perspective of (i) population structure, (ii) linkage disequilibrium (LD), (iii) homozygosity and (iv) haplotype diversity (HD). Analysis was conducted on 3654 individuals from Ireland, Britain (with regional sampling in Scotland), Bulgaria, Portugal, Sweden and the Utah HapMap collection. Our results indicate a subtle but clear genetic structure across Britain and Ireland, although levels of structure were reduced in comparison with average cross-European structure. We observed slightly elevated levels of LD and homozygosity in the Irish population compared with neighbouring European populations. We also report on a cline of HD across Europe with greatest levels in southern populations and lowest levels in Ireland and Scotland. These results are consistent with our understanding of the population history of Europe and promote Ireland and Scotland as relatively homogenous resources for genetic mapping of rare variants.
Collapse
|
30
|
Zou F, Lee S, Knowles MR, Wright FA. Quantification of population structure using correlated SNPs by shrinkage principal components. Hum Hered 2010; 70:9-22. [PMID: 20413978 DOI: 10.1159/000288706] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2009] [Accepted: 02/10/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND/AIMS Association studies using unrelated individuals have become the most popular design for mapping complex traits. One of the major challenges of association mapping is avoiding spurious association due to population stratification. Principal component analysis (PCA) on genome-wide marker genotypes is one of the most popular population stratification control methods. It implicitly assumes that the markers are in linkage equilibrium, a condition that is rarely satisfied and that we plan to relax. METHODS We carefully examined the impact of linkage disequilibrium (LD) on PCA, and proposed a simple modification of the standard PCA to automatically adjust for the correlations among markers. RESULTS We demonstrated that LD patterns in genome-wide association datasets can distort the techniques for stratification control, showing 'subpopulations' reflecting localized LD phenomena rather than plausible population structure. We showed that the proposed method effectively removes the artifactual effect of LD patterns, and successfully recovers underlying population structure that is not apparent from standard PCA. CONCLUSION PCA is highly influenced by sets of SNPs with high LD, obscuring the true population substructure. Our shrinkage PCA applies to all available markers, regardless of the LD patterns. The proposed method is easier to implement than most existing LD adjusted PCA methods.
Collapse
Affiliation(s)
- Fei Zou
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. fzou @ bios.unc.edu
| | | | | | | |
Collapse
|
31
|
Stapley J, Birkhead TR, Burke T, Slate J. Pronounced inter- and intrachromosomal variation in linkage disequilibrium across the zebra finch genome. Genome Res 2010; 20:496-502. [PMID: 20357051 DOI: 10.1101/gr.102095.109] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The extent of nonrandom association of alleles at two or more loci, termed linkage disequilibrium (LD), can reveal much about population demography, selection, and recombination rate, and is a key consideration when designing association mapping studies. Here, we describe a genome-wide analysis of LD in the zebra finch (Taeniopygia guttata) using 838 single nucleotide polymorphisms and present LD maps for all assembled chromosomes. We found that LD declined with physical distance approximately five times faster on the microchromosomes compared to macrochromosomes. The distribution of LD across individual macrochromosomes also varied in a distinct pattern. In the center of the macrochromosomes there were large blocks of markers, sometimes spanning tens of mega bases, in strong LD whereas on the ends of macrochromosomes LD declined more rapidly. Regions of high LD were not simply the result of suppressed recombination around the centromere and this pattern has not been observed previously in other taxa. We also found evidence that this pattern of LD has remained stable across many generations. The variability in LD between and within chromosomes has important implications for genome wide association studies in birds and for our understanding of the distribution of recombination events and the processes that govern them.
Collapse
Affiliation(s)
- Jessica Stapley
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK.
| | | | | | | |
Collapse
|
32
|
Abstract
The 46/1 JAK2 haplotype predisposes to V617F-positive myeloproliferative neoplasms, but the underlying mechanism is obscure. We analyzed essential thrombocythemia patients entered into the PT-1 studies and, as expected, found that 46/1 was overrepresented in V617F-positive cases (n = 404) versus controls (n = 1492, P = 3.9 x 10(-11)). The 46/1 haplotype was also overrepresented in cases without V617F (n = 347, P = .009), with an excess seen for both MPL exon 10 mutated and V617F, MPL exon 10 nonmutated cases. Analysis of further MPL-positive, V617F-negative cases confirmed an excess of 46/1 (n = 176, P = .002), but no association between MPL mutations and MPL haplotype was seen. An excess of 46/1 was also seen in JAK2 exon 12 mutated cases (n = 69, P = .002), and these mutations preferentially arose on the 46/1 chromosome (P = .029). No association between 46/1 and clinical or laboratory features was seen in the PT-1 cohort either with or without V617F. The excess of 46/1 in JAK2 exon 12 cases is compatible with both the "hypermutability" and "fertile ground" hypotheses, but the excess in MPL-mutated cases argues against the former. No difference in sequence, splicing, or expression of JAK2 was found on 46/1 compared with other haplotypes, suggesting that any functional difference of JAK2 on 46/1, if it exists, must be relatively subtle.
Collapse
|
33
|
Jasinska A, Service S, Jawaheer D, DeYoung J, Levinson M, Zhang Z, Kremeyer B, Muller H, Aldana I, Garcia J, Restrepo G, Lopez C, Palacio C, Duque C, Parra M, Vega J, Ortiz D, Bedoya G, Mathews C, Davanzo P, Fournier E, Bejarano J, Ramirez M, Ortiz CA, Araya X, Molina J, Sabatti C, Reus V, Ospina J, Macaya G, Ruiz-Linares A, Freimer N. A narrow and highly significant linkage signal for severe bipolar disorder in the chromosome 5q33 region in Latin American pedigrees. Am J Med Genet B Neuropsychiatr Genet 2009; 150B:998-1006. [PMID: 19319892 PMCID: PMC4815924 DOI: 10.1002/ajmg.b.30956] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We previously reported linkage of bipolar disorder to 5q33-q34 in families from two closely related population isolates, the Central Valley of Costa Rica (CVCR) and Antioquia, Colombia (CO). Here we present follow up results from fine-scale mapping in large CVCR and CO families segregating severe bipolar disorder, BP-I, and in 343 population trios/duos from CVCR and CO. Employing densely spaced SNPs to fine map the prior linkage peak region increases linkage evidence and clarifies the position of the putative BP-I locus. We performed two-point linkage analysis with 1134 SNPs in an approximately 9 Mb region between markers D5S410 and D5S422. Combining pedigrees from CVCR and CO yields a LOD score of 4.9 at SNP rs10035961. Two other SNPs (rs7721142 and rs1422795) within the same 94 kb region also displayed LOD scores greater than 4. This linkage peak coincides with our prior microsatellite results and suggests a narrowed BP-I susceptibility regions in these families. To investigate if the locus implicated in the familial form of BP-I also contributes to disease risk in the population, we followed up the family results with association analysis in duo and trio samples, obtaining signals within 2 Mb of the peak linkage signal in the pedigrees; rs12523547 and rs267015 (P = 0.00004 and 0.00016, respectively) in the CO sample and rs244960 in the CVCR sample and the combined sample, with P = 0.00032 and 0.00016, respectively. It remains unclear whether these association results reflect the same locus contributing to BP susceptibility within the extended pedigrees.
Collapse
Affiliation(s)
- A.J. Jasinska
- Center for Neurobehavioral Genetics, University of California, Los Angeles, California
| | - S. Service
- Center for Neurobehavioral Genetics, University of California, Los Angeles, California
| | - D. Jawaheer
- Center for Neurobehavioral Genetics, University of California, Los Angeles, California
| | - J. DeYoung
- Center for Neurobehavioral Genetics, University of California, Los Angeles, California
| | - M. Levinson
- Center for Neurobehavioral Genetics, University of California, Los Angeles, California
| | - Z. Zhang
- Department of Statistics, University of California, Los Angeles, California
| | - B. Kremeyer
- Galton Laboratory, Department of Biology, University College London, London, United Kingdom
| | - H. Muller
- Galton Laboratory, Department of Biology, University College London, London, United Kingdom
| | - I. Aldana
- Center for Neurobehavioral Genetics, University of California, Los Angeles, California
| | - J. Garcia
- Departamento de Psiquiatria, Universidad de Antioquia, Medellin, Colombia, South Carolina
| | - G. Restrepo
- Departamento de Psiquiatria, Universidad de Antioquia, Medellin, Colombia, South Carolina
| | - C. Lopez
- Departamento de Psiquiatria, Universidad de Antioquia, Medellin, Colombia, South Carolina
| | - C. Palacio
- Departamento de Psiquiatria, Universidad de Antioquia, Medellin, Colombia, South Carolina
| | - C. Duque
- Laboratorio de Genetica Molecular, Universidad de Antioquia, Medellin, Colombia, South Carolina
| | - M. Parra
- Laboratorio de Genetica Molecular, Universidad de Antioquia, Medellin, Colombia, South Carolina
| | - J. Vega
- Laboratorio de Genetica Molecular, Universidad de Antioquia, Medellin, Colombia, South Carolina
| | - D. Ortiz
- Laboratorio de Genetica Molecular, Universidad de Antioquia, Medellin, Colombia, South Carolina
| | - G. Bedoya
- Laboratorio de Genetica Molecular, Universidad de Antioquia, Medellin, Colombia, South Carolina
| | - C. Mathews
- Department of Psychiatry, University of California, San Francisco, California
| | - P. Davanzo
- Department of Psychiatry and Behavioral Sciences, School of Medicine, University of California, Los Angeles, California
| | - E. Fournier
- Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, Costa Rica
| | - J. Bejarano
- Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, Costa Rica
| | - M. Ramirez
- Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, Costa Rica
| | - C. Araya Ortiz
- Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, Costa Rica
| | - X. Araya
- Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, Costa Rica
| | - J. Molina
- Center for Neurobehavioral Genetics, University of California, Los Angeles, California
| | - C. Sabatti
- Department of Statistics, University of California, Los Angeles, California
- Department of Statistics and Department of Human Genetics, University of California, Los Angeles, California
| | - V. Reus
- Department of Psychiatry, University of California, San Francisco, California
| | - J. Ospina
- Departamento de Psiquiatria, Universidad de Antioquia, Medellin, Colombia, South Carolina
| | - G. Macaya
- Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, Costa Rica
| | - A. Ruiz-Linares
- Galton Laboratory, Department of Biology, University College London, London, United Kingdom
| | - N.B. Freimer
- Center for Neurobehavioral Genetics, University of California, Los Angeles, California
| |
Collapse
|
34
|
Abstract
Although previous studies have revealed a great deal about the genetic basis of susceptibility and resistance to parasite infection, there is now an opportunity to considerably enhance understanding through genome-wide association mapping. The application of association mapping to complex inheritance has recently become achievable given reduced costs, sophisticated genotyping platforms and powerful statistical methods which build upon increased knowledge of the linkage disequilibrium structure of the human genome. Linkage mapping and related approaches remain useful for the localization of the rarer genetic variants and candidate region association studies can be a very cost-effective route to progress. However, genome-wide association offers the greatest promise, despite the challenges posed by phenotype complexity, ensuring genotype coverage/quality and robust statistical analysis. The available approaches for mapping genes underlying susceptibility are reviewed here, emphasizing their relative merits and drawbacks and highlighting specific software tools and resources that enable successful mapping.
Collapse
Affiliation(s)
- A Collins
- Human Genetics Division, School of Medicine, Southampton General Hospital, University of Southampton, Southampton, UK.
| |
Collapse
|
35
|
Terwilliger JD, Hiekkalinna T. An utter refutation of the "fundamental theorem of the HapMap". Eur J Hum Genet 2009; 14:426-37. [PMID: 16479260 DOI: 10.1038/sj.ejhg.5201583] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The International HapMap Project was proposed in order to quantify linkage disequilibrium (LD) relationships among human DNA polymorphisms in an assortment of populations, in order to facilitate the process of selecting a minimal set of markers that could capture most of the signal from the untyped markers in a genome-wide association study. The central dogma can be summarized by the argument that if a marker is in tight LD with a polymorphism that directly impacts disease risk, as measured by the metric r(2), then one would be able to detect an association between the marker and disease with sample size that was increased by a factor of 1/r(2) over that needed to detect the effect of the functional variant directly. This "fundamental theorem" holds, however, only if one assumes that the LD between loci and the etiological effect of the functional variant are independent of each other, that they are statistically independent of all other etiological factors (in exposure and action), that sampling is prospective, and that the estimates of r(2) are accurate. None of these are standard operating assumptions, however. We describe the ramifications of these implicit assumptions, and provide simple examples in which the effects of a functional variant could be unequivocally detected if it were directly genotyped, even as markers in high LD with the functional variant would never show association with disease, even in infinite sample sizes. Both theoretical and empirical refutation of the central dogma of genome-wide association studies is thus presented.
Collapse
|
36
|
Facheris MF, Schneider NK, Lesnick TG, de Andrade M, Cunningham JM, Rocca WA, Maraganore DM. Coffee, caffeine-related genes, and Parkinson's disease: a case-control study. Mov Disord 2009; 23:2033-40. [PMID: 18759349 DOI: 10.1002/mds.22247] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
An inverse association between coffee and Parkinson's disease (PD) has been reported. However, it remains uncertain why some but not all coffee drinkers are less susceptible to PD. We considered the possibility of a pharmacogenetic effect. In our study, we included 1,208 subjects (446 case-unaffected sibling pairs and 158 case-unrelated control pairs) recruited from an ongoing study of the molecular epidemiology of PD in the Upper Midwest (USA). We collected information on lifetime coffee drinking and we studied two genes: ADORA2A, which encodes the major receptor activity of caffeine in the brain (variants rs5751876 and rs3032740), and CYP1A2, which encodes the major rate-limiting step of caffeine metabolism (variants rs35694136 and rs762551). We did not observe significant associations of coffee drinking or of the genetic variants with PD susceptibility, either independently or jointly, in the sample overall and in most strata. Our study neither supports the hypothesis that coffee protects against PD nor provides evidence for a pharmacogenetic effect.
Collapse
|
37
|
Jeffreys AJ, Neumann R. The rise and fall of a human recombination hot spot. Nat Genet 2009; 41:625-9. [PMID: 19349985 PMCID: PMC2678279 DOI: 10.1038/ng.346] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2008] [Accepted: 01/16/2009] [Indexed: 12/26/2022]
Abstract
Human meiotic crossovers mainly cluster into narrow hot spots that profoundly influence patterns of haplotype diversity and that may also affect genome instability and sequence evolution. Hot spots also seem to be ephemeral, but processes of hot-spot activation and their subsequent evolutionary dynamics remain unknown. We now analyze the life cycle of a recombination hot spot. Sperm typing revealed a polymorphic hot spot that was activated in cis by a single base change, providing evidence for a primary sequence determinant necessary, though not sufficient, to activate recombination. This activating mutation occurred roughly 70,000 y ago and has persisted to the present, most likely fortuitously through genetic drift despite its systematic elimination by biased gene conversion. Nonetheless, this self-destructive conversion will eventually lead to hot-spot extinction. These findings define a subclass of highly transient hot spots and highlight the importance of understanding hot-spot turnover and how it influences haplotype diversity.
Collapse
|
38
|
Pistis G, Piras I, Pirastu N, Persico I, Sassu A, Picciau A, Prodi D, Fraumene C, Mocci E, Manias MT, Atzeni R, Cosso M, Pirastu M, Angius A. High differentiation among eight villages in a secluded area of Sardinia revealed by genome-wide high density SNPs analysis. PLoS One 2009; 4:e4654. [PMID: 19247500 PMCID: PMC2646134 DOI: 10.1371/journal.pone.0004654] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 01/29/2009] [Indexed: 01/21/2023] Open
Abstract
To better design association studies for complex traits in isolated populations it's important to understand how history and isolation moulded the genetic features of different communities. Population isolates should not “a priori” be considered homogeneous, even if the communities are not distant and part of a small region. We studied a particular area of Sardinia called Ogliastra, characterized by the presence of several distinct villages that display different history, immigration events and population size. Cultural and geographic isolation characterized the history of these communities. We determined LD parameters in 8 villages and defined population structure through high density SNPs (about 360 K) on 360 unrelated people (45 selected samples from each village). These isolates showed differences in LD values and LD map length. Five of these villages show high LD values probably due to their reduced population size and extreme isolation. High genetic differentiation among villages was detected. Moreover population structure analysis revealed a high correlation between genetic and geographic distances. Our study indicates that history, geography and biodemography have influenced the genetic features of Ogliastra communities producing differences in LD and population structure. All these data demonstrate that we can consider each village an isolate with specific characteristics. We suggest that, in order to optimize the study design of complex traits, a thorough characterization of genetic features is useful to identify the presence of sub-populations and stratification within genetic isolates.
Collapse
Affiliation(s)
- Giorgio Pistis
- Istituto di Genetica delle Popolazioni, CNR, Alghero, Sassari, Italy
| | | | | | - Ivana Persico
- Istituto di Genetica delle Popolazioni, CNR, Alghero, Sassari, Italy
| | | | | | | | | | | | | | | | | | - Mario Pirastu
- Istituto di Genetica delle Popolazioni, CNR, Alghero, Sassari, Italy
- Shardna Life Sciences, Pula, Cagliari, Italy
| | - Andrea Angius
- Istituto di Genetica delle Popolazioni, CNR, Alghero, Sassari, Italy
- Shardna Life Sciences, Pula, Cagliari, Italy
- * E-mail:
| |
Collapse
|
39
|
Abstract
Linkage disequilibrium was estimated using 7119 single nucleotide polymorphism markers across the genome and 200 animals from the North American Holstein cattle population. The analysis of maternally inherited haplotypes revealed strong linkage disequilibrium (r(2) > 0.8) in genomic regions of approximately 50 kb or less. While linkage disequilibrium decays as a function of genomic distance, genomic regions within genes showed greater linkage disequilibrium and greater variation in linkage disequilibrium compared with intergenic regions. Identification of haplotype blocks could characterize the most common haplotypes. Although maximum haplotype block size was over 1 Mb, mean block size was 26-113 kb by various definitions, which was larger than that observed in humans ( approximately 10 kb). Effective population size of the dairy cattle population was estimated from linkage disequilibrium between single nucleotide polymorphism marker pairs in various haplotype ranges. Rapid reduction of effective population size of dairy cattle was inferred from linkage disequilibrium in recent generations. This result implies a loss of genetic diversity because of the high rate of inbreeding and high selection intensity in dairy cattle. The pattern observed in this study indicated linkage disequilibrium in the current dairy cattle population could be exploited to refine mapping resolution. Changes in effective population size during past generations imply a necessity of plans to maintain polymorphism in the Holstein population.
Collapse
Affiliation(s)
- E-S Kim
- Department of Animal Sciences, University of Wisconsin, Madison, WI 53706, USA
| | | |
Collapse
|
40
|
Mutations in phospholipase C epsilon 1 are not sufficient to cause diffuse mesangial sclerosis. Kidney Int 2009; 75:415-9. [DOI: 10.1038/ki.2008.573] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
41
|
Sabatti C, Service SK, Hartikainen AL, Pouta A, Ripatti S, Brodsky J, Jones CG, Zaitlen NA, Varilo T, Kaakinen M, Sovio U, Ruokonen A, Laitinen J, Jakkula E, Coin L, Hoggart C, Collins A, Turunen H, Gabriel S, Elliot P, McCarthy MI, Daly MJ, Järvelin MR, Freimer NB, Peltonen L. Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet 2008; 41:35-46. [PMID: 19060910 DOI: 10.1038/ng.271] [Citation(s) in RCA: 552] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2008] [Accepted: 10/03/2008] [Indexed: 02/06/2023]
Abstract
Genome-wide association studies (GWAS) of longitudinal birth cohorts enable joint investigation of environmental and genetic influences on complex traits. We report GWAS results for nine quantitative metabolic traits (triglycerides, high-density lipoprotein, low-density lipoprotein, glucose, insulin, C-reactive protein, body mass index, and systolic and diastolic blood pressure) in the Northern Finland Birth Cohort 1966 (NFBC1966), drawn from the most genetically isolated Finnish regions. We replicate most previously reported associations for these traits and identify nine new associations, several of which highlight genes with metabolic functions: high-density lipoprotein with NR1H3 (LXRA), low-density lipoprotein with AR and FADS1-FADS2, glucose with MTNR1B, and insulin with PANK1. Two of these new associations emerged after adjustment of results for body mass index. Gene-environment interaction analyses suggested additional associations, which will require validation in larger samples. The currently identified loci, together with quantified environmental exposures, explain little of the trait variation in NFBC1966. The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.
Collapse
Affiliation(s)
- Chiara Sabatti
- Department of Human Genetics and Los Angeles, Los Angeles, California 90095, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Jakkula E, Rehnström K, Varilo T, Pietiläinen OP, Paunio T, Pedersen NL, deFaire U, Järvelin MR, Saharinen J, Freimer N, Ripatti S, Purcell S, Collins A, Daly MJ, Palotie A, Peltonen L. The genome-wide patterns of variation expose significant substructure in a founder population. Am J Hum Genet 2008; 83:787-94. [PMID: 19061986 DOI: 10.1016/j.ajhg.2008.11.005] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2008] [Revised: 11/10/2008] [Accepted: 11/11/2008] [Indexed: 02/06/2023] Open
Abstract
Although high-density SNP genotyping platforms generate a momentum for detailed genome-wide association (GWA) studies, an offshoot is a new insight into population genetics. Here, we present an example in one of the best-known founder populations by scrutinizing ten distinct Finnish early- and late-settlement subpopulations. By determining genetic distances, homozygosity, and patterns of linkage disequilibrium, we demonstrate that population substructure, and even individual ancestry, is detectable at a very high resolution and supports the concept of multiple historical bottlenecks resulting from consecutive founder effects. Given that genetic studies are currently aiming at identifying smaller and smaller genetic effects, recognizing and controlling for population substructure even at this fine level becomes imperative to avoid confounding and spurious associations. This study provides an example of the power of GWA data sets to demonstrate stratification caused by population history even within a seemingly homogeneous population, like the Finns. Further, the results provide interesting lessons concerning the impact of population history on the genome landscape of humans, as well as approaches to identify rare variants enriched in these subpopulations.
Collapse
|
43
|
Brenner EV, Smagulova FO, Morozov IV. Independent origin of rare Y168H mutation of human phenylalanine hydroxylase gene in Russia. RUSS J GENET+ 2008. [DOI: 10.1134/s1022795408100165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
44
|
Abstract
Herein, we investigate whether single-nucleotide polymorphisms (SNPs) across the PARK10 locus are associated with susceptibility to Parkinson's disease (PD) or age at onset (AAO) of disease. One hundred and eighty-eight SNPs were genotyped across the PARK10 locus in 180 PD patients and 180 controls from central Norway (stage 1). We then used the linkage disequilibrium (LD) structure from stage 1 to select 75 SNPs for genotyping in 186 patients and 186 controls from Ireland (stage 2). Nineteen SNPs were selected from this and previous studies for follow-up in an extended Norwegian series (530 patients and 1142 controls), the Irish series and a US series (221 patients and 221 controls) (stage 3). After correction for multiple testing, markers within ubiquitin specific peptidase 24 (USP24) are significantly associated with PD within Norwegian, Irish, and US series combined (rs13312: odds ratio (OR) 0.78, P<0.001; rs487230: OR 0.80, P=0.001). Independently, the association for rs13312 is strongest in the extended Norwegian series (OR 0.76, P=0.005), although not significant after correction for multiple testing (P< or =0.003 is considered significant). ORs in the Irish series are almost identical, and a similar but a weaker effect was observed for the US series. No marker showed consistent association with AAO. Our data indicate that genetic variability in USP24 is associated with PD. Although our work extends and confirms a previous report, the observed effect size does not explain the PARK10 linkage peak.
Collapse
|
45
|
Identification and replication of three novel myopia common susceptibility gene loci on chromosome 3q26 using linkage and linkage disequilibrium mapping. PLoS Genet 2008; 4:e1000220. [PMID: 18846214 PMCID: PMC2556391 DOI: 10.1371/journal.pgen.1000220] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2008] [Accepted: 09/10/2008] [Indexed: 12/22/2022] Open
Abstract
Refractive error is a highly heritable quantitative trait responsible for considerable morbidity. Following an initial genome-wide linkage study using microsatellite markers, we confirmed evidence for linkage to chromosome 3q26 and then conducted fine-scale association mapping using high-resolution linkage disequilibrium unit (LDU) maps. We used a preliminary discovery marker set across the 30-Mb region with an average SNP density of 1 SNP/15 kb (Map 1). Map 1 was divided into 51 LDU windows and additional SNPs were genotyped for six regions (Map 2) that showed preliminary evidence of multi-marker association using composite likelihood. A total of 575 cases and controls selected from the tails of the trait distribution were genotyped for the discovery sample. Malecot model estimates indicate three loci with putative common functional variants centred on MFN1 (180,566 kb; 95% confidence interval 180,505–180, 655 kb), approximately 156 kb upstream from alternate-splicing SOX2OT (182,595 kb; 95% CI 182,533–182,688 kb) and PSARL (184,386 kb; 95% CI 184,356–184,411 kb), with the loci showing modest to strong evidence of association for the Map 2 discovery samples (p<10−7, p<10−10, and p = 0.01, respectively). Using an unselected independent sample of 1,430 individuals, results replicated for the MFN1 (p = 0.006), SOX2OT (p = 0.0002), and PSARL (p = 0.0005) gene regions. MFN1 and PSARL both interact with OPA1 to regulate mitochondrial fusion and the inhibition of mitochondrial-led apoptosis, respectively. That two mitochondrial regulatory processes in the retina are implicated in the aetiology of myopia is surprising and is likely to provide novel insight into the molecular genetic basis of common myopia. Successful gene mapping strategies for common disease continue to require careful consideration of basic study design with the advent of genome-wide association studies. Here, we take advantage of prior information that the heritability of the quantitative trait myopia in the general population is high and shows evidence of replicated linkage to chromosome 3q26. Based on this, we conducted a fine map linkage disequilibrium association study for the region, using a high-resolution genetic map derived from population-based HapMap Phase II data. For analysis, we used efficient multi-locus tests of association using single nucleotide polymorphism markers genotyped for our sample data and placed on the genetic map measured in linkage disequilibrium units. We followed up preliminary evidence of association for the discovery samples with further genotyping in the same samples to improve the model location estimates for the common functional variants we identified. Three locations were replicated using an independent sample. Two of the identified genes are likely to play an unexpected role in myopia with both pivotal in the healthy housekeeping metabolism of retinal mitochondria. Both proteins interact with OPA1, with nonsynonymous OPA1 mutations causing the unrelated Mendelian disease Autosomal Dominant Optic Atrophy (ADOA) by triggering mitochondrial-led retinal ganglia cell apoptosis.
Collapse
|
46
|
Kristiansson K, Naukkarinen J, Peltonen L. Isolated populations and complex disease gene identification. Genome Biol 2008; 9:109. [PMID: 18771588 PMCID: PMC2575505 DOI: 10.1186/gb-2008-9-8-109] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Isolated populations can be useful for the identification of genes underlying common complex diseases. The utility of genetically isolated populations (population isolates) in the mapping and identification of genes is not only limited to the study of rare diseases; isolated populations also provide a useful resource for studies aimed at improved understanding of the biology underlying common diseases and their component traits. Well characterized human populations provide excellent study samples for many different genetic investigations, ranging from genome-wide association studies to the characterization of interactions between genes and the environment.
Collapse
Affiliation(s)
- Kati Kristiansson
- National Public Health Institute and FIMM, Institute for Molecular Medicine Finland, Helsinki 00300, Finland
| | | | | |
Collapse
|
47
|
Sperm cross-over activity in regions of the human genome showing extreme breakdown of marker association. Proc Natl Acad Sci U S A 2008; 105:10471-6. [PMID: 18650392 DOI: 10.1073/pnas.0804933105] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Population diversity data have recently provided profound, albeit inferential, insights into meiotic recombination across the human genome, revealing a landscape dominated by thousands of cross-over hotspots. However, very few of these putative hotspots have been directly analyzed for cross-over activity. We now describe a search for very active hotspots, by using extreme breakdown of marker association as a guide for high-resolution sperm cross-over analysis. This strategy has led to the isolation of the most active cross-over hotspots yet described. Their morphology, sequence attributes, and cross-over processes are very similar to those seen at less active hotspots, but their activity in sperm is poorly predicted from population diversity information. Several of these hotspots showed evidence for biased gene conversion accompanying cross-over, in some cases associated with variation between men in cross-over activity and with two hotspots showing complete presence/absence polymorphism in different men. Hotspot polymorphism is very common at less active hotspots but curiously was not seen at any of the most active hotspots. This contrasts with the prediction that extreme hotspots should be the most vulnerable to attenuation by meiotic drive in favor of mutations that suppress recombination and should therefore show rapid rate evolution and thus variation in activity between men. Finally, these very intense hotspots provide a valuable resource for dissecting meiotic recombination processes and pathways in humans.
Collapse
|
48
|
Khatkar MS, Nicholas FW, Collins AR, Zenger KR, Cavanagh JAL, Barris W, Schnabel RD, Taylor JF, Raadsma HW. Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel. BMC Genomics 2008; 9:187. [PMID: 18435834 PMCID: PMC2386485 DOI: 10.1186/1471-2164-9-187] [Citation(s) in RCA: 160] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2008] [Accepted: 04/24/2008] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND The extent of linkage disequilibrium (LD) within a population determines the number of markers that will be required for successful association mapping and marker-assisted selection. Most studies on LD in cattle reported to date are based on microsatellite markers or small numbers of single nucleotide polymorphisms (SNPs) covering one or only a few chromosomes. This is the first comprehensive study on the extent of LD in cattle by analyzing data on 1,546 Holstein-Friesian bulls genotyped for 15,036 SNP markers covering all regions of all autosomes. Furthermore, most studies in cattle have used relatively small sample sizes and, consequently, may have had biased estimates of measures commonly used to describe LD. We examine minimum sample sizes required to estimate LD without bias and loss in accuracy. Finally, relatively little information is available on comparative LD structures including other mammalian species such as human and mouse, and we compare LD structure in cattle with public-domain data from both human and mouse. RESULTS We computed three LD estimates, D', Dvol and r2, for 1,566,890 syntenic SNP pairs and a sample of 365,400 non-syntenic pairs. Mean D' is 0.189 among syntenic SNPs, and 0.105 among non-syntenic SNPs; mean r2 is 0.024 among syntenic SNPs and 0.0032 among non-syntenic SNPs. All three measures of LD for syntenic pairs decline with distance; the decline is much steeper for r2 than for D' and Dvol. The value of D' and Dvol are quite similar. Significant LD in cattle extends to 40 kb (when estimated as r2) and 8.2 Mb (when estimated as D'). The mean values for LD at large physical distances are close to those for non-syntenic SNPs. Minor allelic frequency threshold affects the distribution and extent of LD. For unbiased and accurate estimates of LD across marker intervals spanning < 1 kb to > 50 Mb, minimum sample sizes of 400 (for D') and 75 (for r2) are required. The bias due to small samples sizes increases with inter-marker interval. LD in cattle is much less extensive than in a mouse population created from crossing inbred lines, and more extensive than in humans. CONCLUSION For association mapping in Holstein-Friesian cattle, for a given design, at least one SNP is required for each 40 kb, giving a total requirement of at least 75,000 SNPs for a low power whole-genome scan (median r2 > 0.19) and up to 300,000 markers at 10 kb intervals for a high power genome scan (median r2 > 0.62). For estimation of LD by D' and Dvol with sufficient precision, a sample size of at least 400 is required, whereas for r2 a minimum sample of 75 is adequate.
Collapse
Affiliation(s)
- Mehar S Khatkar
- Centre for Advanced Technologies in Animal Genetics and Reproduction (ReproGen), University of Sydney, Camden, NSW 2570, Australia.
| | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Tapper W, Gibson J, Morton NE, Collins A. A comparison of methods to detect recombination hotspots. Hum Hered 2008; 66:157-69. [PMID: 18408383 DOI: 10.1159/000126050] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2007] [Accepted: 09/06/2007] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE A number of linkage disequilibrium (LD)-based methods have been developed to describe recombination and infer hotspots. We determine the correspondence between LDMAP and LDhat, and between LDMAP and LDhot by comparison with linkage maps and hotspots that have been verified by sperm typing. METHODS Regression and variance analyses were used to compare LDMAP and LDhat with linkage maps. The location and intensity of hotspots inferred by LDMAP and LDhot were compared with fifteen verified hotspots. RESULTS Despite different methodologies and assumptions, LDMAP, LDhat, and linkage maps are highly concordant. Closer inspection shows that LDMAP corresponds more closely with linkage maps across the genome and on sixteen chromosomes compared with LDhat. LDhot identified fourteen and ten of the verified hotspots using high and low density maps. In comparison, LDMAP identified all fifteen hotspots at high and low density. However, some significant discrepancies between sperm and LD-based recombination rates remain. CONCLUSIONS Combining information from linkage and LDMAP to construct sex-specific high resolution linkage maps suggests that some of these discrepancies may be due to female recombination while others may relate to the age of hotspots. LDMAP based estimates between approximately 68,000 and approximately 112,000 hotspots in the genome with mean widths less than 4 kb.
Collapse
Affiliation(s)
- William Tapper
- Human Genetics Division, School of Medicine, Southampton General Hospital, University of Southampton, Southampton, UK.
| | | | | | | |
Collapse
|
50
|
A multimetric approach to analysis of genome-wide association by single markers and composite likelihood. Proc Natl Acad Sci U S A 2008; 105:2592-7. [PMID: 18268331 DOI: 10.1073/pnas.0711903105] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Two case/control studies with different phenotypes, marker densities, and microarrays were examined for the most significant single markers in defined regions. They show a pronounced bias toward exaggerated significance that increases with the number of observed markers and would increase further with imputed markers. This bias is eliminated by Bonferroni adjustment, thereby allowing combination by principal component analysis with a Malecot model composite likelihood evaluated by a permutation procedure to allow for multiple dependent markers. This intermediate value identifies the only demonstrated causal locus as most significant even in the preliminary analysis and clearly recognizes the strongest candidate in the other sample. Because the three metrics (most significant single marker, composite likelihood, and their principal component) are correlated, choice of the n smallest P values by each test gives <3n regions for follow-up in the next stage. In this way, methods with different response to marker selection and density are given approximately equal weight and economically compared, without expressing an untested prejudice or sacrificing the most significant results for any of them. Large numbers of cases, controls, and markers are by themselves insufficient to control type 1 and 2 errors, and so efficient use of multiple metrics with Bonferroni adjustment promises to be valuable in identifying causal variants and optimal design simultaneously.
Collapse
|