1
|
Nazarian A, Cook B, Morado M, Kulminski AM. Interaction Analysis Reveals Complex Genetic Associations with Alzheimer's Disease in the CLU and ABCA7 Gene Regions. Genes (Basel) 2023; 14:1666. [PMID: 37761806 PMCID: PMC10531324 DOI: 10.3390/genes14091666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 08/12/2023] [Accepted: 08/18/2023] [Indexed: 09/29/2023] Open
Abstract
Sporadic Alzheimer's disease (AD) is a polygenic neurodegenerative disorder. Single-nucleotide polymorphisms (SNPs) in multiple genes (e.g., CLU and ABCA7) have been associated with AD. However, none of them were characterized as causal variants that indicate the complex genetic architecture of AD, which is likely affected by individual variants and their interactions. We performed a meta-analysis of four independent cohorts to examine associations of 32 CLU and 50 ABCA7 polymorphisms as well as their 496 and 1225 pair-wise interactions with AD. The single SNP analyses revealed that six CLU and five ABCA7 SNPs were associated with AD. Ten of them were previously not reported. The interaction analyses identified AD-associated compound genotypes for 25 CLU and 24 ABCA7 SNP pairs, whose comprising SNPs were not associated with AD individually. Three and one additional CLU and ABCA7 pairs composed of the AD-associated SNPs showed partial interactions as the minor allele effect of one SNP in each pair was intensified in the absence of the minor allele of the other SNP. The interactions identified here may modulate associations of the CLU and ABCA7 variants with AD. Our analyses highlight the importance of the roles of combinations of genetic variants in AD risk assessment.
Collapse
Affiliation(s)
- Alireza Nazarian
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC 27705, USA (M.M.)
| | | | | | - Alexander M. Kulminski
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC 27705, USA (M.M.)
| |
Collapse
|
2
|
Nazarian A, Loika Y, He L, Culminskaya I, Kulminski AM. Genome-wide analysis identified abundant genetic modulators of contributions of the apolipoprotein E alleles to Alzheimer's disease risk. Alzheimers Dement 2022; 18:2067-2078. [PMID: 34978151 PMCID: PMC9250541 DOI: 10.1002/alz.12540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 08/31/2021] [Accepted: 10/25/2021] [Indexed: 01/31/2023]
Abstract
INTRODUCTION The apolipoprotein E (APOE) ε2 and ε4 alleles have beneficial and adverse impacts on Alzheimer's disease (AD), respectively, with incomplete penetrance, which may be modulated by other genetic variants. METHODS We examined whether the associations of the APOE alleles with other polymorphisms in the genome can be sensitive to AD-affection status. RESULTS We identified associations of the ε2 and ε4 alleles with 314 and 232 polymorphisms, respectively. Of them, 35 and 31 polymorphisms had significantly different effects in AD-affected and -unaffected groups, suggesting their potential involvement in the AD pathogenesis by modulating the effects of the ε2 and ε4 alleles, respectively. Our survival-type analysis of the AD risk supported modulating roles of multiple group-specific polymorphisms. Our functional analysis identified gene enrichment in multiple immune-related biological processes, for example, B cell function. DISCUSSION These findings suggest involvement of local and inter-chromosomal modulators of the effects of the APOE alleles on the AD risk.
Collapse
Affiliation(s)
- Alireza Nazarian
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Yury Loika
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Liang He
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Irina Culminskaya
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Alexander M. Kulminski
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| |
Collapse
|
3
|
Nazarian A, Loiko E, Yassine HN, Finch CE, Kulminski AM. APOE alleles modulate associations of plasma metabolites with variants from multiple genes on chromosome 19q13.3. Front Aging Neurosci 2022; 14:1023493. [PMID: 36389057 PMCID: PMC9650319 DOI: 10.3389/fnagi.2022.1023493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 10/07/2022] [Indexed: 11/13/2022] Open
Abstract
The APOE ε2, ε3, and ε4 alleles differentially impact various complex diseases and traits. We examined whether these alleles modulated associations of 94 single-nucleotide polymorphisms (SNPs) harbored by 26 genes in 19q13.3 region with 217 plasma metabolites using Framingham Heart Study data. The analyses were performed in the E2 (ε2ε2 or ε2ε3 genotype), E3 (ε3ε3 genotype), and E4 (ε3ε4 or ε4ε4 genotype) groups separately. We identified 31, 17, and 22 polymorphism-metabolite associations in the E2, E3, and E4 groups, respectively, at a false discovery rate P FDR < 0.05. These entailed 51 and 19 associations with 20 lipid and 12 polar analytes. Contrasting the effect sizes between the analyzed groups showed 20 associations with group-specific effects at Bonferroni-adjusted P < 7.14E-04. Three associations with glutamic acid or dimethylglycine had significantly larger effects in the E2 than E3 group and 12 associations with triacylglycerol 56:5, lysophosphatidylethanolamines 16:0, 18:0, 20:4, or phosphatidylcholine 38:6 had significantly larger effects in the E2 than E4 group. Two associations with isocitrate or propionate and three associations with phosphatidylcholines 32:0, 32:1, or 34:0 had significantly larger effects in the E4 than E3 group. Nine of 70 SNP-metabolite associations identified in either E2, E3, or E4 groups attained P FDR < 0.05 in the pooled sample of these groups. However, none of them were among the 20 group-specific associations. Consistent with the evolutionary history of the APOE alleles, plasma metabolites showed higher APOE-cluster-related variations in the E4 than E2 and E3 groups. Pathway enrichment mainly highlighted lipids and amino acids metabolism and citrate cycle, which can be differentially impacted by the APOE alleles. These novel findings expand insights into the genetic heterogeneity of plasma metabolites and highlight the importance of the APOE-allele-stratified genetic analyses of the APOE-related diseases and traits.
Collapse
Affiliation(s)
- Alireza Nazarian
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, United States
| | - Elena Loiko
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, United States
| | - Hussein N. Yassine
- Departments of Medicine and Neurology, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States
| | - Caleb E. Finch
- Andrus Gerontology Center, University of Southern California, Los Angeles, CA, United States
| | - Alexander M. Kulminski
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, United States
| |
Collapse
|
4
|
Nazarian A, Philipp I, Culminskaya I, He L, Kulminski AM. Inter- and intra-chromosomal modulators of the APOE ɛ2 and ɛ4 effects on the Alzheimer's disease risk. GeroScience 2022; 45:233-247. [PMID: 35809216 PMCID: PMC9886755 DOI: 10.1007/s11357-022-00617-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 06/24/2022] [Indexed: 02/03/2023] Open
Abstract
The mechanisms of incomplete penetrance of risk-modifying impacts of apolipoprotein E (APOE) ε2 and ε4 alleles on Alzheimer's disease (AD) have not been fully understood. We performed genome-wide analysis of differences in linkage disequilibrium (LD) patterns between 6,136 AD-affected and 10,555 AD-unaffected subjects from five independent studies to explore whether the association of the APOE ε2 allele (encoded by rs7412 polymorphism) and ε4 allele (encoded by rs429358 polymorphism) with AD was modulated by autosomal polymorphisms. The LD analysis identified 24 (mostly inter-chromosomal) and 57 (primarily intra-chromosomal) autosomal polymorphisms with significant differences in LD with either rs7412 or rs429358, respectively, between AD-affected and AD-unaffected subjects, indicating their potential modulatory roles. Our Cox regression analysis showed that minor alleles of four inter-chromosomal and ten intra-chromosomal polymorphisms exerted significant modulating effects on the ε2- and ε4-associated AD risks, respectively, and identified ε2-independent (rs2884183 polymorphism, 11q22.3) and ε4-independent (rs483082 polymorphism, 19q13.32) associations with AD. Our functional analysis highlighted ε2- and/or ε4-linked processes affecting the lipid and lipoprotein metabolism and cell junction organization which may contribute to AD pathogenesis. These findings provide insights into the ε2- and ε4-associated mechanisms of AD pathogenesis, underlying their incomplete penetrance.
Collapse
Affiliation(s)
- Alireza Nazarian
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Erwin Mill Building, 2024 W. Main St, Durham, NC, 27705, USA.
| | - Ian Philipp
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Erwin Mill Building, 2024 W. Main St, Durham, NC 27705 USA
| | - Irina Culminskaya
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Erwin Mill Building, 2024 W. Main St, Durham, NC 27705 USA
| | - Liang He
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Erwin Mill Building, 2024 W. Main St, Durham, NC 27705 USA
| | - Alexander M. Kulminski
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Erwin Mill Building, 2024 W. Main St, Durham, NC 27705 USA
| |
Collapse
|
5
|
Liu L, Chandrashekar P, Zeng B, Sanderford MD, Kumar S, Gibson G. TreeMap: a structured approach to fine mapping of eQTL variants. Bioinformatics 2021; 37:1125-1134. [PMID: 33135051 PMCID: PMC8150140 DOI: 10.1093/bioinformatics/btaa927] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 10/01/2020] [Accepted: 10/20/2020] [Indexed: 11/14/2022] Open
Abstract
Motivation Expression quantitative trait loci (eQTL) harbor genetic variants modulating gene transcription. Fine mapping of regulatory variants at these loci is a daunting task due to the juxtaposition of causal and linked variants at a locus as well as the likelihood of interactions among multiple variants. This problem is exacerbated in genes with multiple cis-acting eQTL, where superimposed effects of adjacent loci further distort the association signals. Results We developed a novel algorithm, TreeMap, that identifies putative causal variants in cis-eQTL accounting for multisite effects and genetic linkage at a locus. Guided by the hierarchical structure of linkage disequilibrium, TreeMap performs an organized search for individual and multiple causal variants. Via extensive simulations, we show that TreeMap detects co-regulating variants more accurately than current methods. Furthermore, its high computational efficiency enables genome-wide analysis of long-range eQTL. We applied TreeMap to GTEx data of brain hippocampus samples and transverse colon samples to search for eQTL in gene bodies and in 4 Mbps gene-flanking regions, discovering numerous distal eQTL. Furthermore, we found concordant distal eQTL that were present in both brain and colon samples, implying long-range regulation of gene expression. Availability and implementation TreeMap is available as an R package enabled for parallel processing at https://github.com/liliulab/treemap. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA.,Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Pramod Chandrashekar
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA.,Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Biao Zeng
- Center for Integrative Genomics, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA.,Department of Biology, Temple University, Philadelphia, PA 19122, USA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Greg Gibson
- Center for Integrative Genomics, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
6
|
Sehgal D, Mondal S, Crespo-Herrera L, Velu G, Juliana P, Huerta-Espino J, Shrestha S, Poland J, Singh R, Dreisigacker S. Haplotype-Based, Genome-Wide Association Study Reveals Stable Genomic Regions for Grain Yield in CIMMYT Spring Bread Wheat. Front Genet 2020; 11:589490. [PMID: 33335539 PMCID: PMC7737720 DOI: 10.3389/fgene.2020.589490] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 10/21/2020] [Indexed: 01/16/2023] Open
Abstract
We untangled key regions of the genetic architecture of grain yield (GY) in CIMMYT spring bread wheat by conducting a haplotype-based, genome-wide association study (GWAS), together with an investigation of epistatic interactions using seven large sets of elite yield trials (EYTs) consisting of a total of 6,461 advanced breeding lines. These lines were phenotyped under irrigated and stress environments in seven growing seasons (2011-2018) and genotyped with genotyping-by-sequencing markers. Genome-wide 519 haplotype blocks were constructed, using a linkage disequilibrium-based approach covering 14,036 Mb in the wheat genome. Haplotype-based GWAS identified 7, 4, 10, and 15 stable (significant in three or more EYTs) associations in irrigated (I), mild drought (MD), severe drought (SD), and heat stress (HS) testing environments, respectively. Considering all EYTs and the four testing environments together, 30 stable associations were deciphered with seven hotspots identified on chromosomes 1A, 1B, 2B, 4A, 5B, 6B, and 7B, where multiple haplotype blocks were associated with GY. Epistatic interactions contributed significantly to the genetic architecture of GY, explaining variation of 3.5-21.1%, 3.7-14.7%, 3.5-20.6%, and 4.4- 23.1% in I, MD, SD, and HS environments, respectively. Our results revealed the intricate genetic architecture of GY, controlled by both main and epistatic effects. The importance of these results for practical applications in the CIMMYT breeding program is discussed.
Collapse
Affiliation(s)
- Deepmala Sehgal
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Suchismita Mondal
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Govindan Velu
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Philomin Juliana
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | | | - Jesse Poland
- Kansas State University, Manhattan, KS, United States
| | - Ravi Singh
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | |
Collapse
|
7
|
Sehgal D, Dreisigacker S. Haplotypes-based genetic analysis: benefits and challenges. Vavilovskii Zhurnal Genet Selektsii 2019. [DOI: 10.18699/vj19.37-o] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The increasing availability of Single Nucleotide Polymorphisms (SNPs) discovered by Next Generation Sequencing will enable a range of new genetic analyses in crops, which was not possible before. Concomitantly, researchers will face the challenge of handling large data sets at the whole-genome level. By grouping thousands of SNPs into a few hundred haplotype blocks, complexity of the data can be reduced with fewer statistical tests and a lower probability of spurious associations. Owing to the strong genome structure present in breeding lines of most crops, the deployment of haplotypes could be a powerful complement to improve efficiency of marker-assisted and genomic selection. This review describes in brief the commonly used approaches to construct haplotype blocks and some examples in animals and crops are cited where haplotype-based dissection of traits were proven beneficial. Some important considerations and facts while working with haplotypes in crops are reviewed at the end.
Collapse
Affiliation(s)
- D. Sehgal
- International Center for Maize and Wheat Improvement (CIMMYT)
| | - S. Dreisigacker
- International Center for Maize and Wheat Improvement (CIMMYT)
| |
Collapse
|
8
|
Ridge PG, Wadsworth ME, Miller JB, Saykin AJ, Green RC, Kauwe JSK. Assembly of 809 whole mitochondrial genomes with clinical, imaging, and fluid biomarker phenotyping. Alzheimers Dement 2018; 14:514-519. [PMID: 29306584 PMCID: PMC5961720 DOI: 10.1016/j.jalz.2017.11.013] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 11/03/2017] [Accepted: 11/28/2017] [Indexed: 12/30/2022]
Abstract
INTRODUCTION Mitochondrial genetics are an important but largely neglected area of research in Alzheimer's disease. A major impediment is the lack of data sets. METHODS We used an innovative, rigorous approach, combining several existing tools with our own, to accurately assemble and call variants in 809 whole mitochondrial genomes. RESULTS To help address this impediment, we prepared a data set that consists of 809 complete and annotated mitochondrial genomes with samples from the Alzheimer's Disease Neuroimaging Initiative. These whole mitochondrial genomes include rich phenotyping, such as clinical, fluid biomarker, and imaging data, all of which is available through the Alzheimer's Disease Neuroimaging Initiative website. Genomes are cleaned, annotated, and prepared for analysis. DISCUSSION These data provide an important resource for investigating the impact of mitochondrial genetic variation on risk for Alzheimer's disease and other phenotypes that have been measured in the Alzheimer's Disease Neuroimaging Initiative samples.
Collapse
Affiliation(s)
- Perry G Ridge
- Department of Biology, Brigham Young University, Provo, UT, USA
| | | | - Justin B Miller
- Department of Biology, Brigham Young University, Provo, UT, USA
| | - Andrew J Saykin
- Radiology and Imaging Sciences, Medical and Molecular Genetics and the Indiana Alzheimer's Disease Center, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Robert C Green
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Partners HealthCare Personalized Medicine, The Broad Institute and Harvard Medical School, Boston, MA, USA
| | - John S K Kauwe
- Department of Biology, Brigham Young University, Provo, UT, USA; Department of Neuroscience, Brigham Young University, Provo, UT, USA.
| |
Collapse
|
9
|
Ridge PG, Kauwe JSK. Mitochondria and Alzheimer's Disease: the Role of Mitochondrial Genetic Variation. CURRENT GENETIC MEDICINE REPORTS 2018; 6:1-10. [PMID: 29564191 PMCID: PMC5842281 DOI: 10.1007/s40142-018-0132-2] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Purpose of Review Alzheimer’s disease (AD) is the most common form of dementia, affects an increasing number of people worldwide, has a rapidly increasing incidence, and is fatal. In the past several years, significant progress has been made towards solving the genetic architecture of AD, but our understanding remains incomplete and has not led to treatments that either cure or slow disease. There is substantial evidence that mitochondria are involved in AD: mitochondrial functional declines in AD, mitochondrial encoded gene expression changes, mitochondria are morphologically different, and mitochondrial fusion/fission are modified. While a majority of mitochondrial proteins are nuclear encoded and could lead to malfunction in mitochondria, the mitochondrial genome encodes numerous proteins important for the electron transport chain, which if damaged could possibly lead to mitochondrial changes observed in AD. Here, we review publications that describe a relationship between the mitochondrial genome and AD and make suggestions for analysis approaches and data acquisition, from existing datasets, to study the mitochondrial genetics of AD. Recent Findings Numerous mitochondrial haplogroups and SNPs have been reported to influence risk for AD, but the majority of these have not been replicated, nor experimentally validated. Summary The role of the mitochondrial genome in AD remains elusive, and several impediments exist to fully understand the relationship between the mitochondrial genome and AD. Yet, by leveraging existing datasets and implementing appropriate analysis approaches, determining the role of mitochondrial genetics in risk for AD is possible.
Collapse
Affiliation(s)
- Perry G. Ridge
- Department of Biology, Brigham Young University, 4102 LSB, Provo, UT 84602 USA
| | - John S. K. Kauwe
- Department of Biology, Brigham Young University, 4102 LSB, Provo, UT 84602 USA
| |
Collapse
|
10
|
Pérez-Losada M, Castel AD, Lewis B, Kharfen M, Cartwright CP, Huang B, Maxwell T, Greenberg AE, Crandall KA. Characterization of HIV diversity, phylodynamics and drug resistance in Washington, DC. PLoS One 2017; 12:e0185644. [PMID: 28961263 PMCID: PMC5621693 DOI: 10.1371/journal.pone.0185644] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 09/16/2017] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Washington DC has a high burden of HIV with a 2.0% HIV prevalence. The city is a national and international hub potentially containing a broad diversity of HIV variants; yet few sequences from DC are available on GenBank to assess the evolutionary history of HIV in the US capital. Towards this general goal, here we analyze extensive sequence data and investigate HIV diversity, phylodynamics, and drug resistant mutations (DRM) in DC. METHODS Molecular HIV-1 sequences were collected from participants infected through 2015 as part of the DC Cohort, a longitudinal observational study of HIV+ patients receiving care at 13 DC clinics. Sequences were paired with Cohort demographic, risk, and clinical data and analyzed using maximum likelihood, Bayesian and coalescent approaches of phylogenetic, network and population genetic inference. We analyzed 601 sequences from 223 participants for int (~864 bp) and 2,810 sequences from 1,659 participants for PR/RT (~1497 bp). RESULTS Ninety-nine and 94% of the int and PR/RT sequences, respectively, were identified as subtype B, with 14 non-B subtypes also detected. Phylodynamic analyses of US born infected individuals showed that HIV population size varied little over time with no significant decline in diversity. Phylogenetic analyses grouped 13.5% of the int sequences into 14 clusters of 2 or 3 sequences, and 39.0% of the PR/RT sequences into 203 clusters of 2-32 sequences. Network analyses grouped 3.6% of the int sequences into 4 clusters of 2 sequences, and 10.6% of the PR/RT sequences into 76 clusters of 2-7 sequences. All network clusters were detected in our phylogenetic analyses. Higher proportions of clustered sequences were found in zip codes where HIV prevalence is highest (r = 0.607; P<0.00001). We detected a high prevalence of DRM for both int (17.1%) and PR/RT (39.1%), but only 8 int and 12 PR/RT amino acids were identified as under adaptive selection. We observed a significant (P<0.0001) association between main risk factors (men who have sex with men and heterosexuals) and genotypes in the five well-supported clusters with sufficient sample size for testing. DISCUSSION Pairing molecular data with clinical and demographic data provided novel insights into HIV population dynamics in Washington, DC. Identification of populations and geographic locations where clustering occurs can inform and complement active surveillance efforts to interrupt HIV transmission.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Ashburn, VA, United States of America
- CIBIO-InBIO, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
- Milken Institute School of Public Health, Department of Epidemiology and Biostatistics, The George Washington University, Washington, DC, United States of America
| | - Amanda D. Castel
- Milken Institute School of Public Health, Department of Epidemiology and Biostatistics, The George Washington University, Washington, DC, United States of America
| | - Brittany Lewis
- Milken Institute School of Public Health, Department of Epidemiology and Biostatistics, The George Washington University, Washington, DC, United States of America
| | - Michael Kharfen
- District of Columbia Department of Health, Washington, DC, United States of America
| | | | - Bruce Huang
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Ashburn, VA, United States of America
| | - Taylor Maxwell
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Ashburn, VA, United States of America
| | - Alan E. Greenberg
- Milken Institute School of Public Health, Department of Epidemiology and Biostatistics, The George Washington University, Washington, DC, United States of America
| | - Keith A. Crandall
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Ashburn, VA, United States of America
| | | |
Collapse
|
11
|
N’Diaye A, Haile JK, Cory AT, Clarke FR, Clarke JM, Knox RE, Pozniak CJ. Single Marker and Haplotype-Based Association Analysis of Semolina and Pasta Colour in Elite Durum Wheat Breeding Lines Using a High-Density Consensus Map. PLoS One 2017; 12:e0170941. [PMID: 28135299 PMCID: PMC5279799 DOI: 10.1371/journal.pone.0170941] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 01/12/2017] [Indexed: 12/30/2022] Open
Abstract
Association mapping is usually performed by testing the correlation between a single marker and phenotypes. However, because patterns of variation within genomes are inherited as blocks, clustering markers into haplotypes for genome-wide scans could be a worthwhile approach to improve statistical power to detect associations. The availability of high-density molecular data allows the possibility to assess the potential of both approaches to identify marker-trait associations in durum wheat. In the present study, we used single marker- and haplotype-based approaches to identify loci associated with semolina and pasta colour in durum wheat, the main objective being to evaluate the potential benefits of haplotype-based analysis for identifying quantitative trait loci. One hundred sixty-nine durum lines were genotyped using the Illumina 90K Infinium iSelect assay, and 12,234 polymorphic single nucleotide polymorphism (SNP) markers were generated and used to assess the population structure and the linkage disequilibrium (LD) patterns. A total of 8,581 SNPs previously localized to a high-density consensus map were clustered into 406 haplotype blocks based on the average LD distance of 5.3 cM. Combining multiple SNPs into haplotype blocks increased the average polymorphism information content (PIC) from 0.27 per SNP to 0.50 per haplotype. The haplotype-based analysis identified 12 loci associated with grain pigment colour traits, including the five loci identified by the single marker-based analysis. Furthermore, the haplotype-based analysis resulted in an increase of the phenotypic variance explained (50.4% on average) and the allelic effect (33.7% on average) when compared to single marker analysis. The presence of multiple allelic combinations within each haplotype locus offers potential for screening the most favorable haplotype series and may facilitate marker-assisted selection of grain pigment colour in durum wheat. These results suggest a benefit of haplotype-based analysis over single marker analysis to detect loci associated with colour traits in durum wheat.
Collapse
Affiliation(s)
- Amidou N’Diaye
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Jemanesh K. Haile
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Aron T. Cory
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Fran R. Clarke
- Semiarid Prairie Agricultural Research Centre, Agriculture and Agri-Food Canada, Swift Current, Saskatchewan, Canada
| | - John M. Clarke
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Ron E. Knox
- Semiarid Prairie Agricultural Research Centre, Agriculture and Agri-Food Canada, Swift Current, Saskatchewan, Canada
| | - Curtis J. Pozniak
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| |
Collapse
|
12
|
Duchemin SI, Glantz M, de Koning DJ, Paulsson M, Fikse WF. Identification of QTL on Chromosome 18 Associated with Non-Coagulating Milk in Swedish Red Cows. Front Genet 2016; 7:57. [PMID: 27148354 PMCID: PMC4832587 DOI: 10.3389/fgene.2016.00057] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 03/25/2016] [Indexed: 11/19/2022] Open
Abstract
Non-coagulating (NC) milk, defined as milk not coagulating within 40 min after rennet-addition, can have a negative influence on cheese production. Its prevalence is estimated at 18% in the Swedish Red (SR) cow population. Our study aimed at identifying genomic regions and causal variants associated with NC milk in SR cows, by doing a GWAS using 777k SNP genotypes and using imputed sequences to fine map the most promising genomic region. Phenotypes were available from 382 SR cows belonging to 21 herds in the south of Sweden, from which individual morning milk was sampled. NC milk was treated as a binary trait, receiving a score of one in case of non-coagulation within 40 min. For all 382 SR cows, 777k SNP genotypes were available as well as the combined genotypes of the genetic variants of αs1-β-κ-caseins. In addition, whole-genome sequences from the 1000 Bull Genome Consortium (Run 3) were available for 429 animals of 15 different breeds. From these sequences, 33 sequences belonged to SR and Finish Ayrshire bulls with a large impact in the SR cow population. Single-marker analyses were run in ASReml using an animal model. After fitting the casein loci, 14 associations at -Log10(P-value) > 6 identified a promising region located on BTA18. We imputed sequences to the 382 genotyped SR cows using Beagle 4 for half of BTA18, and ran a region-wide association study with imputed sequences. In a seven mega base-pairs region on BTA18, our strongest association with NC milk explained almost 34% of the genetic variation in NC milk. Since it is possible that multiple QTL are in strong LD in this region, 59 haplotypes were built, genetically differentiated by means of a phylogenetic tree, and tested in phenotype-genotype association studies. Haplotype analyses support the existence of one QTL underlying NC milk in SR cows. A candidate gene of interest is the VPS35 gene, for which one of our strongest association is an intron SNP in this gene. The VPS35 gene belongs to the mammary gene sets of pre-parturient and of lactating cows.
Collapse
Affiliation(s)
- Sandrine I. Duchemin
- Department of Animal Breeding and Genetics, Swedish University of Agricultural SciencesUppsala, Sweden
- Animal Breeding and Genomics Centre, Wageningen UniversityWageningen, Netherlands
| | - Maria Glantz
- Department of Food Technology, Engineering and Nutrition, Lund UniversityLund, Sweden
| | - Dirk-Jan de Koning
- Department of Animal Breeding and Genetics, Swedish University of Agricultural SciencesUppsala, Sweden
| | - Marie Paulsson
- Department of Food Technology, Engineering and Nutrition, Lund UniversityLund, Sweden
| | - Willem F. Fikse
- Department of Animal Breeding and Genetics, Swedish University of Agricultural SciencesUppsala, Sweden
| |
Collapse
|
13
|
Zhang Q, Abel H, Wells A, Lenzini P, Gomez F, Province MA, Templeton AA, Weinstock GM, Salzman NH, Borecki IB. Selection of models for the analysis of risk-factor trees: leveraging biological knowledge to mine large sets of risk factors with application to microbiome data. ACTA ACUST UNITED AC 2015; 31:1607-13. [PMID: 25568281 DOI: 10.1093/bioinformatics/btu855] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Accepted: 12/23/2014] [Indexed: 12/29/2022]
Abstract
MOTIVATION Establishment of a statistical association between microbiome features and clinical outcomes is of growing interest because of the potential for yielding insights into biological mechanisms and pathogenesis. Extracting microbiome features that are relevant for a disease is challenging and existing variable selection methods are limited due to large number of risk factor variables from microbiome sequence data and their complex biological structure. RESULTS We propose a tree-based scanning method, Selection of Models for the Analysis of Risk factor Trees (referred to as SMART-scan), for identifying taxonomic groups that are associated with a disease or trait. SMART-scan is a model selection technique that uses a predefined taxonomy to organize the large pool of possible predictors into optimized groups, and hierarchically searches and determines variable groups for association test. We investigate the statistical properties of SMART-scan through simulations, in comparison to a regular single-variable analysis and three commonly-used variable selection methods, stepwise regression, least absolute shrinkage and selection operator (LASSO) and classification and regression tree (CART). When there are taxonomic group effects in the data, SMART-scan can significantly increase power by using bacterial taxonomic information to split large numbers of variables into groups. Through an application to microbiome data from a vervet monkey diet experiment, we demonstrate that SMART-scan can identify important phenotype-associated taxonomic features missed by single-variable analysis, stepwise regression, LASSO and CART.
Collapse
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Haley Abel
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Alan Wells
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Petra Lenzini
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Felicia Gomez
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Michael A Province
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Alan A Templeton
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - George M Weinstock
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Nita H Salzman
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Ingrid B Borecki
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA
| |
Collapse
|
14
|
Allele-specific network reveals combinatorial interaction that transcends small effects in psoriasis GWAS. PLoS Comput Biol 2014; 10:e1003766. [PMID: 25233071 PMCID: PMC4168982 DOI: 10.1371/journal.pcbi.1003766] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2013] [Accepted: 05/20/2014] [Indexed: 12/20/2022] Open
Abstract
Hundreds of genetic markers have shown associations with various complex diseases, yet the “missing heritability” remains alarmingly elusive. Combinatorial interactions may account for a substantial portion of this missing heritability, but their discoveries have been impeded by computational complexity and genetic heterogeneity. We present BlocBuster, a novel systems-level approach that efficiently constructs genome-wide, allele-specific networks that accurately segregate homogenous combinations of genetic factors, tests the associations of these combinations with the given phenotype, and rigorously validates the results using a series of unbiased validation methods. BlocBuster employs a correlation measure that is customized for single nucleotide polymorphisms and returns a multi-faceted collection of values that captures genetic heterogeneity. We applied BlocBuster to analyze psoriasis, discovering a combinatorial pattern with an odds ratio of 3.64 and Bonferroni-corrected p-value of 5.01×10−16. This pattern was replicated in independent data, reflecting robustness of the method. In addition to improving prediction of disease susceptibility and broadening our understanding of the pathogenesis underlying psoriasis, these results demonstrate BlocBuster's potential for discovering combinatorial genetic associations within heterogeneous genome-wide data, thereby transcending the limiting “small effects” produced by individual markers examined in isolation. Most complex diseases arise due to combinations of genetic factors, yet current genome-wide association studies (GWAS) typically examine individual genetic markers in isolation because of the complexity of considering a prohibitively large number of marker combinations. Another complication for GWAS stems from genetic heterogeneity, in which different subsets of individuals develop a given disease due to different sets of genetic factors. We present BlocBuster, a network-based method that addresses these challenges and extracts inter-correlated genetic markers that manifest significant associations with complex diseases. Our analysis of psoriasis GWAS data revealed a significant combinatorial genetic pattern, which was validated using stringent computational tests and replication in independent data. This pattern is more significant than other previously identified markers. We also compared Pearson's correlation coefficient and observed that it introduced more type I errors and produced a less structured network than BlocBuster; the former also broke the combinatorial pattern into pieces. In addition to improving prediction of disease susceptibility and broadening our understanding of the pathogenesis underlying psoriasis, these results demonstrate BlocBuster's effectiveness for discovering combinatorial genetic associations within heterogeneous backgrounds, thereby transcending the limiting “small effects” produced by individual markers examined in isolation.
Collapse
|
15
|
Ridge PG, Maxwell TJ, Foutz SJ, Bailey MH, Corcoran CD, Tschanz JT, Norton MC, Munger RG, O'Brien E, Kerber RA, Cawthon RM, Kauwe JSK. Mitochondrial genomic variation associated with higher mitochondrial copy number: the Cache County Study on Memory Health and Aging. BMC Bioinformatics 2014; 15 Suppl 7:S6. [PMID: 25077862 PMCID: PMC4110732 DOI: 10.1186/1471-2105-15-s7-s6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background The mitochondria are essential organelles and are the location of cellular respiration, which is responsible for the majority of ATP production. Each cell contains multiple mitochondria, and each mitochondrion contains multiple copies of its own circular genome. The ratio of mitochondrial genomes to nuclear genomes is referred to as mitochondrial copy number. Decreases in mitochondrial copy number are known to occur in many tissues as people age, and in certain diseases. The regulation of mitochondrial copy number by nuclear genes has been studied extensively. While mitochondrial variation has been associated with longevity and some of the diseases known to have reduced mitochondrial copy number, the role that the mitochondrial genome itself has in regulating mitochondrial copy number remains poorly understood. Results We analyzed the complete mitochondrial genomes from 1007 individuals randomly selected from the Cache County Study on Memory Health and Aging utilizing the inferred evolutionary history of the mitochondrial haplotypes present in our dataset to identify sequence variation and mitochondrial haplotypes associated with changes in mitochondrial copy number. Three variants belonging to mitochondrial haplogroups U5A1 and T2 were significantly associated with higher mitochondrial copy number in our dataset. Conclusions We identified three variants associated with higher mitochondrial copy number and suggest several hypotheses for how these variants influence mitochondrial copy number by interacting with known regulators of mitochondrial copy number. Our results are the first to report sequence variation in the mitochondrial genome that causes changes in mitochondrial copy number. The identification of these variants that increase mtDNA copy number has important implications in understanding the pathological processes that underlie these phenotypes.
Collapse
|
16
|
APOE modulates the correlation between triglycerides, cholesterol, and CHD through pleiotropy, and gene-by-gene interactions. Genetics 2013; 195:1397-405. [PMID: 24097412 DOI: 10.1534/genetics.113.157719] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Relationship loci (rQTL) exist when the correlation between multiple traits varies by genotype. rQTL often occur due to gene-by-gene (G × G) or gene-by-environmental interactions, making them a powerful tool for detecting G × G. Here we present an empirical analysis of apolipoprotein E (APOE) with respect to lipid traits and incident CHD leading to the discovery of loci that interact with APOE to affect these traits. We found that the relationship between total cholesterol (TC) and triglycerides (ln TG) varies by APOE isoform genotype in African-American (AA) and European-American (EA) populations. The e2 allele is associated with strong correlation between ln TG and TC while the e4 allele leads to little or no correlation. This led to a priori hypotheses that APOE genotypes affect the relationship of TC and/or ln TG with incident CHD. We found that APOE*TC was significant (P = 0.016) for AA but not EA while APOE*ln TG was significant for EA (P = 0.027) but not AA. In both cases, e2e2 and e2e3 had strong relationships between TC and ln TG with CHD while e2e4 and e4e4 results in little or no relationship between TC and ln TG with CHD. Using ARIC GWAS data, scans for loci that significantly interact with APOE produced four loci for African Americans (one CHD, one TC, and two HDL). These interactions contribute to the rQTL pattern. rQTL are a powerful tool to identify loci that modify the relationship between risk factors and disease and substantially increase statistical power for detecting G × G.
Collapse
|
17
|
Ridge PG, Koop A, Maxwell TJ, Bailey MH, Swerdlow RH, Kauwe JSK, Honea RA. Mitochondrial haplotypes associated with biomarkers for Alzheimer's disease. PLoS One 2013; 8:e74158. [PMID: 24040196 PMCID: PMC3770576 DOI: 10.1371/journal.pone.0074158] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Accepted: 07/28/2013] [Indexed: 01/30/2023] Open
Abstract
Various studies have suggested that the mitochondrial genome plays a role in late-onset Alzheimer's disease, although results are mixed. We used an endophenotype-based approach to further characterize mitochondrial genetic variation and its relationship to risk markers for Alzheimer's disease. We analyzed longitudinal data from non-demented, mild cognitive impairment, and late-onset Alzheimer's disease participants in the Alzheimer's Disease Neuroimaging Initiative with genetic, brain imaging, and behavioral data. We assessed the relationship of structural MRI and cognitive biomarkers with mitochondrial genome variation using TreeScanning, a haplotype-based approach that concentrates statistical power by analyzing evolutionarily meaningful groups (or clades) of haplotypes together for association with a phenotype. Four clades were associated with three different endophenotypes: whole brain volume, percent change in temporal pole thickness, and left hippocampal atrophy over two years. This is the first study of its kind to identify mitochondrial variation associated with brain imaging endophenotypes of Alzheimer's disease. Our results provide additional evidence that the mitochondrial genome plays a role in risk for Alzheimer's disease.
Collapse
Affiliation(s)
- Perry G. Ridge
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
- ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, Utah, United States of America
| | - Andre Koop
- Kansas University Alzheimer’s Disease Center, Department of Neurology, University of Kansas School of Medicine, Kansas City, Kansas, United States of America
| | - Taylor J. Maxwell
- Human Genetics Center, University of Texas School of Public Health, Houston, Texas, United States of America
| | - Matthew H. Bailey
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| | - Russell H. Swerdlow
- Kansas University Alzheimer’s Disease Center, Department of Neurology, University of Kansas School of Medicine, Kansas City, Kansas, United States of America
| | - John S. K. Kauwe
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| | - Robyn A. Honea
- Kansas University Alzheimer’s Disease Center, Department of Neurology, University of Kansas School of Medicine, Kansas City, Kansas, United States of America
| | | |
Collapse
|
18
|
Ridge PG, Maxwell TJ, Corcoran CD, Norton MC, Tschanz JT, O’Brien E, Kerber RA, Cawthon RM, Munger RG, Kauwe JSK. Mitochondrial genomic analysis of late onset Alzheimer's disease reveals protective haplogroups H6A1A/H6A1B: the Cache County Study on Memory in Aging. PLoS One 2012; 7:e45134. [PMID: 23028804 PMCID: PMC3444479 DOI: 10.1371/journal.pone.0045134] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Accepted: 08/14/2012] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is the most common cause of dementia and AD risk clusters within families. Part of the familial aggregation of AD is accounted for by excess maternal vs. paternal inheritance, a pattern consistent with mitochondrial inheritance. The role of specific mitochondrial DNA (mtDNA) variants and haplogroups in AD risk is uncertain. METHODOLOGY/PRINCIPAL FINDINGS We determined the complete mitochondrial genome sequence of 1007 participants in the Cache County Study on Memory in Aging, a population-based prospective cohort study of dementia in northern Utah. AD diagnoses were made with a multi-stage protocol that included clinical examination and review by a panel of clinical experts. We used TreeScanning, a statistically robust approach based on haplotype networks, to analyze the mtDNA sequence data. Participants with major mitochondrial haplotypes H6A1A and H6A1B showed a reduced risk of AD (p=0.017, corrected for multiple comparisons). The protective haplotypes were defined by three variants: m.3915G>A, m.4727A>G, and m.9380G>A. These three variants characterize two different major haplogroups. Together m.4727A>G and m.9380G>A define H6A1, and it has been suggested m.3915G>A defines H6A. Additional variants differentiate H6A1A and H6A1B; however, none of these variants had a significant relationship with AD case-control status. CONCLUSIONS/SIGNIFICANCE Our findings provide evidence of a reduced risk of AD for individuals with mtDNA haplotypes H6A1A and H6A1B. These findings are the results of the largest study to date with complete mtDNA genome sequence data, yet the functional significance of the associated haplotypes remains unknown and replication in others studies is necessary.
Collapse
Affiliation(s)
- Perry G. Ridge
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
- ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, Utah, United States of America
| | - Taylor J. Maxwell
- Human Genetics Center, University of Texas School of Public Health, Houston, Texas, United States of America
| | - Christopher D. Corcoran
- Department of Mathematics and Statistics, Utah State University, Logan, Utah, United States of America
- Center for Epidemiologic Studies, Utah State University, Logan, Utah, United States of America
| | - Maria C. Norton
- Center for Epidemiologic Studies, Utah State University, Logan, Utah, United States of America
- Department of Family Consumer and Human Development, Utah State University, Logan, Utah, United States of America
- Department of Psychology, Utah State University, Logan, Utah, United States of America
| | - JoAnn T. Tschanz
- Center for Epidemiologic Studies, Utah State University, Logan, Utah, United States of America
- Department of Psychology, Utah State University, Logan, Utah, United States of America
| | - Elizabeth O’Brien
- Department of Epidemiology and Population Health, University of Louisville, Louisville, Kentucky, United States of America
| | - Richard A. Kerber
- Department of Epidemiology and Population Health, University of Louisville, Louisville, Kentucky, United States of America
| | - Richard M. Cawthon
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| | - Ronald G. Munger
- Center for Epidemiologic Studies, Utah State University, Logan, Utah, United States of America
- Department of Nutrition, Dietetics, and Food Sciences, Utah State University, Logan, Utah, United States of America
| | - John S. K. Kauwe
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| |
Collapse
|
19
|
Liu Y, Yang SX, Ji PZ, Gao LZ. Phylogeography of Camellia taliensis (Theaceae) inferred from chloroplast and nuclear DNA: insights into evolutionary history and conservation. BMC Evol Biol 2012; 12:92. [PMID: 22716114 PMCID: PMC3495649 DOI: 10.1186/1471-2148-12-92] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2011] [Accepted: 06/21/2012] [Indexed: 12/01/2022] Open
Abstract
Background As one of the most important but seriously endangered wild relatives of the cultivated tea, Camellia taliensis harbors valuable gene resources for tea tree improvement in the future. The knowledge of genetic variation and population structure may provide insights into evolutionary history and germplasm conservation of the species. Results Here, we sampled 21 natural populations from the species' range in China and performed the phylogeography of C. taliensis by using the nuclear PAL gene fragment and chloroplast rpl32-trnL intergenic spacer. Levels of haplotype diversity and nucleotide diversity detected at rpl32-trnL (h = 0.841; π = 0.00314) were almost as high as at PAL (h = 0.836; π = 0.00417). Significant chloroplast DNA population subdivision was detected (GST = 0.988; NST = 0.989), suggesting fairly high genetic differentiation and low levels of recurrent gene flow through seeds among populations. Nested clade phylogeographic analysis of chlorotypes suggests that population genetic structure in C. taliensis has been affected by habitat fragmentation in the past. However, the detection of a moderate nrDNA population subdivision (GST = 0.222; NST = 0.301) provided the evidence of efficient pollen-mediated gene flow among populations and significant phylogeographical structure (NST > GST; P < 0.01). The analysis of PAL haplotypes indicates that phylogeographical pattern of nrDNA haplotypes might be caused by restricted gene flow with isolation by distance, which was also supported by Mantel’s test of nrDNA haplotypes (r = 0.234, P < 0.001). We found that chlorotype C1 was fixed in seven populations of Lancang River Region, implying that the Lancang River might have provided a corridor for the long-distance dispersal of the species. Conclusions We found that C. taliensis showed fairly high genetic differentiation resulting from restricted gene flow and habitat fragmentation. This phylogeographical study gives us deep insights into population structure of the species and conservation strategies for germplasm sampling and developing in situ conservation of natural populations.
Collapse
Affiliation(s)
- Yang Liu
- Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650204, China
| | | | | | | |
Collapse
|
20
|
Auzanneau J, Huyghe C, Escobar-Gutiérrez AJ, Julier B, Gastal F, Barre P. Association study between the gibberellic acid insensitive gene and leaf length in a Lolium perenne L. synthetic variety. BMC PLANT BIOLOGY 2011; 11:183. [PMID: 22204490 PMCID: PMC3292539 DOI: 10.1186/1471-2229-11-183] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Accepted: 12/28/2011] [Indexed: 05/04/2023]
Abstract
BACKGROUND Association studies are of great interest to identify genes explaining trait variation since they deal with more than just a few alleles like classical QTL analyses. They are usually performed using collections representing a wide range of variability but which could present a genetic substructure. The aim of this paper is to demonstrate that association studies can be performed using synthetic varieties obtained after several panmictic generations. This demonstration is based on an example of association between the gibberellic acid insensitive gene (GAI) polymorphism and leaf length polymorphism in 'Herbie', a synthetic variety of perennial ryegrass. METHODS Leaf growth parameters, consisted of leaf length, maximum leaf elongation rate (LERmax) and leaf elongation duration (LED), were evaluated in spring and autumn on 216 plants of Herbie with three replicates. For each plant, a sequence of 370 bp in GAI was analysed for polymorphism. RESULTS Genetic effect was highly significant for all traits. Broad sense heritabilities were higher for leaf length and LERmax with about 0.7 in each period and 0.5 considering both periods than for LED with about 0.4 in each period and 0.3 considering both periods. GAI was highly polymorphic with an average of 12 bp between two consecutive SNPs and 39 haplotypes in which 9 were more frequent. Linkage disequilibrium declined rapidly with distance with r 2 values lower than 0.2 beyond 150 bp. Sequence polymorphism of GAI explained 8-14% of leaf growth parameter variation. A single SNP explained 4% of the phenotypic variance of leaf length in both periods which represents a difference of 33 mm on an average of 300 mm. CONCLUSIONS Synthetic varieties in which linkage disequilibrium declines rapidly with distance are suitable for association studies using the "candidate gene" approach. GAI polymorphism was found to be associated with leaf length polymorphism which was more correlated to LERmax than to LED in Herbie. It is a good candidate to explain leaf length variation in other plant material.
Collapse
Affiliation(s)
- Jérôme Auzanneau
- INRA, UR4, Unité de Recherche Pluridisciplinaire Prairies et Plantes Fourragères, Le Chêne, RD 150, 86600 Lusignan, France
| | - Christian Huyghe
- INRA, UR4, Unité de Recherche Pluridisciplinaire Prairies et Plantes Fourragères, Le Chêne, RD 150, 86600 Lusignan, France
| | - Abraham J Escobar-Gutiérrez
- INRA, UR4, Unité de Recherche Pluridisciplinaire Prairies et Plantes Fourragères, Le Chêne, RD 150, 86600 Lusignan, France
| | - Bernadette Julier
- INRA, UR4, Unité de Recherche Pluridisciplinaire Prairies et Plantes Fourragères, Le Chêne, RD 150, 86600 Lusignan, France
| | - François Gastal
- INRA, UR4, Unité de Recherche Pluridisciplinaire Prairies et Plantes Fourragères, Le Chêne, RD 150, 86600 Lusignan, France
| | - Philippe Barre
- INRA, UR4, Unité de Recherche Pluridisciplinaire Prairies et Plantes Fourragères, Le Chêne, RD 150, 86600 Lusignan, France
| |
Collapse
|
21
|
Chloroplast phylogeography of Helianthemum songaricum (Cistaceae) from northwestern China: implications for preservation of genetic diversity. CONSERV GENET 2011. [DOI: 10.1007/s10592-011-0250-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
22
|
Bansal V, Libiger O, Torkamani A, Schork NJ. Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 2010; 11:773-85. [PMID: 20940738 PMCID: PMC3743540 DOI: 10.1038/nrg2867] [Citation(s) in RCA: 381] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The limitations of genome-wide association (GWA) studies that focus on the phenotypic influence of common genetic variants have motivated human geneticists to consider the contribution of rare variants to phenotypic expression. The increasing availability of high-throughput sequencing technologies has enabled studies of rare variants but these methods will not be sufficient for their success as appropriate analytical methods are also needed. We consider data analysis approaches to testing associations between a phenotype and collections of rare variants in a defined genomic region or set of regions. Ultimately, although a wide variety of analytical approaches exist, more work is needed to refine them and determine their properties and power in different contexts.
Collapse
Affiliation(s)
- Vikas Bansal
- The Scripps Translational Science Institute, 3344 North Torrey Pines Court, Suite 300, La Jolla, California 92037, USA
| | | | | | | |
Collapse
|
23
|
Terry RG. Re-evaluation of morphological and chloroplast DNA variation in Juniperus osteosperma Hook and Juniperus occidentalis Torr. Little (Cupressaceae) and their putative hybrids. BIOCHEM SYST ECOL 2010. [DOI: 10.1016/j.bse.2010.03.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
24
|
Bardel C, Danjean V, Morange P, Génin E, Darlu P. On the use of phylogeny-based tests to detect association between quantitative traits and haplotypes. Genet Epidemiol 2010; 33:729-39. [PMID: 19399905 DOI: 10.1002/gepi.20425] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
With the increasing availability of genetic data, several SNPs in a candidate gene can be combined into haplotypes to test for association with a quantitative trait. When the number of SNPs increases, the number of haplotypes can become very large and there is a need to group them together. The use of the phylogenetic relationships between haplotypes provides a natural and efficient way of grouping. Moreover, it allows us to identify disease or quantitative trait-related loci. In this article, we describe ALTree-q, a phylogeny-based approach to test for association between quantitative traits and haplotypes and to identify putative quantitative trait nucleotides (QTN). This study focuses on ALTree-q association test which is based on one-way analyses of variance (ANOVA) performed at the different levels of the tree. The statistical properties (type-one error and power rates) were estimated through simulations under different genetic models and were compared to another phylogeny-based test, TreeScan, (Templeton, 2005) and to a haplotypic omnibus test consisting in a one-way ANOVA between all haplotypes. For dominant and additive models ALTree-q is usually the most powerful test whereas TreeScan performs better under a recessive model. However, power depends strongly on the recurrence rate of the QTN, on the QTN allele frequency, and on the linkage disequilibrium between the QTN and other markers. An application of the method on Thrombin Activatable Fibronolysis Inhibitor Antigen levels in European and African samples confirms a possible association with polymorphisms of the CPB2 gene and identifies several QTNs.
Collapse
|
25
|
Phylogenetics applied to genotype/phenotype association and selection analyses with sequence data from angptl4 in humans. Int J Mol Sci 2010; 11:370-85. [PMID: 20162021 PMCID: PMC2821009 DOI: 10.3390/ijms11010370] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2009] [Revised: 01/06/2010] [Accepted: 01/17/2010] [Indexed: 11/16/2022] Open
Abstract
Genotype/phenotype association analyses (Treescan) with plasma lipid levels and functional site prediction methods (TreeSAAP and PolyPhen) were performed using sequence data for ANGPTL4 from 3,551 patients in the Dallas Heart Study. Biological assays of rare variants in phenotypic tails and results from a Treescan analysis were used as “known” variants to assess the site prediction abilities of PolyPhen and TreeSAAP. The E40K variant in European Americans and the R278Q variant in African Americans were significantly associated with multiple lipid phenotypes. Combining TreeSAAP and PolyPhen performed well to predict “known” functional variants while reducing noise from false positives.
Collapse
|
26
|
The diverse applications of cladistic analysis of molecular evolution, with special reference to nested clade analysis. Int J Mol Sci 2010; 11:124-39. [PMID: 20162005 PMCID: PMC2820993 DOI: 10.3390/ijms11010124] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Revised: 01/06/2010] [Accepted: 01/06/2010] [Indexed: 11/17/2022] Open
Abstract
The genetic variation found in small regions of the genomes of many species can be arranged into haplotype trees that reflect the evolutionary genealogy of the DNA lineages found in that region and the accumulation of mutations on those lineages. This review demonstrates some of the many ways in which clades (branches) of haplotype trees have been applied in recent years, including the study of genotype/phenotype associations at candidate loci and in genome-wide association studies, the phylogeographic history of species, human evolution, the conservation of endangered species, and the identification of species.
Collapse
|
27
|
Wang J, de Villena FPM, Moore KJ, Wang W, Zhang Q, McMillan L. Genome-wide compatible SNP intervals and their properties. THE 2010 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND COMPUTATIONAL BIOLOGY : ACM-BCB 2010 : NIAGARA FALLS, NEW YORK, U.S.A., AUGUST 2-4, 2010. ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (1ST : 2010 :... 2010; 2010:43-52. [PMID: 29152612 DOI: 10.1145/1854776.1854788] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Intraspecific genomes can be subdivided into blocks with limited diversity. Understanding the distribution and structure of these blocks will help to unravel many biological problems including the identification of genes associated with complex diseases, finding the ancestral origins of a given population, and localizing regions of historical recombination, gene conversion, and homoplasy. We present methods for partitioning a genome into blocks for which there are no apparent recombinations, thus providing parsimonious sets of compatible genome intervals based on the four-gamete test. Our contribution is a thorough analysis of the problem of dividing a genome into compatible intervals, in terms of its computational complexity, and by providing an achievable lower-bound on the minimal number of intervals required to cover an entire data set. In general, such minimal interval partitions are not unique. However, we identify properties that are common to every possible solution. We also define the notion of an interval set that achieves the interval lower-bound, yet maximizes interval overlap. We demonstrate algorithms for partitioning both haplotype data from inbred mice as well as outbred heterozygous genotype data using extensions of the standard four-gamete test. These methods allow our algorithms to be applied to a wide range of genomic data sets.
Collapse
Affiliation(s)
- Jeremy Wang
- Dept. of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | - Kyle J Moore
- Dept. of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Wei Wang
- Dept. of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Qi Zhang
- Dept. of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Leonard McMillan
- Dept. of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
28
|
|
29
|
Templeton AR. Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation. Mol Ecol 2009; 18:319-31. [PMID: 19192182 DOI: 10.1111/j.1365-294x.2008.04026.x] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Nested clade phylogeographical analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographical hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographical model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyse a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the 'probabilities' generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models.
Collapse
Affiliation(s)
- Alan R Templeton
- Department of Biology, Washington University, St. Louis, MO 63130-4899, USA.
| |
Collapse
|
30
|
Climer S, Jäger G, Templeton AR, Zhang W. How frugal is Mother Nature with haplotypes? Bioinformatics 2009; 25:68-74. [PMID: 18987010 DOI: 10.1093/bioinformatics/btn572] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Inference of haplotypes from genotype data is crucial and challenging for many vitally important studies. The first, and most critical step, is the ascertainment of a biologically sound model to be optimized. Many models that have been proposed rely partially or entirely on reducing the number of unique haplotypes in the solution. RESULTS This article examines the parsimony of haplotypes using known haplotypes as well as genotypes from the HapMap project. Our study reveals that there are relatively few unique haplotypes, but not always the least possible, for the datasets with known solutions. Furthermore, we show that there are frequently very large numbers of parsimonious solutions, and the number increases exponentially with increasing cardinality. Moreover, these solutions are quite varied, most of which are not consistent with the true solutions. These results quantify the limitations of the Pure Parsimony model and demonstrate the imperative need to consider additional properties for haplotype inference models. At a higher level, and with broad applicability, this article illustrates the power of combinatorial methods to tease out imperfections in a given biological model.
Collapse
Affiliation(s)
- Sharlee Climer
- Department of Computer Science and Engineering, Washington University, St. Louis, MO, USA
| | | | | | | |
Collapse
|
31
|
Knight J, Curtis D, Sham PC. CLUMPHAP: a simple tool for performing haplotype-based association analysis. Genet Epidemiol 2008; 32:539-45. [PMID: 18395815 DOI: 10.1002/gepi.20327] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The completion of the HapMap Project and the development of high-throughput single nucleotide polymorphism genotyping technologies have greatly enhanced the prospects of identifying and characterizing the genetic variants that influence complex traits. In principle, association analysis of haplotypes rather than single nucleotide polymorphisms may better capture an underlying causal variant, but the multiple haplotypes can lead to reduced statistical power due to the testing of (and need to correct for) a large number of haplotypes. This paper presents a novel method based on clustering similar haplotypes to address this issue. The method, implemented in the CLUMPHAP program, is an extension of the CLUMP program designed for the analysis of multi-allelic markers (Sham and Curtis [1995] Ann. Hum. Genet. 59(Pt1):97-105). CLUMPHAP performs a hierarchical clustering of the haplotypes and then computes the chi(2) statistic between each haplotype cluster and disease; the statistical significance of the largest of the chi(2) statistics is obtained by permutation testing. A significant result suggests that the presence of a disease-causing variant in the haplotype cluster is over-represented in cases. Using simulation studies, we have compared CLUMPHAP and more widely used approaches in terms of their statistical power to identify an untyped susceptibility locus. Our results show that CLUMPHAP tends to have greater power than the omnibus haplotype test and is comparable in power to multiple regression locus-coding approaches.
Collapse
Affiliation(s)
- Jo Knight
- Social Genetic & Developmental Psychiatry MRC Centre, Institute of Psychiatry, Kings College London, De Crespigny Park, London, UK.
| | | | | |
Collapse
|
32
|
Branicki W, Szczerbińska A, Brudnik U, Wolańska-Nowak P, Kupiec T. The OCA2 gene as a marker for eye colour prediction. FORENSIC SCIENCE INTERNATIONAL GENETICS SUPPLEMENT SERIES 2008. [DOI: 10.1016/j.fsigss.2007.10.062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
33
|
Rhodes B, Morris DL, Subrahmanyan L, Aubin C, de Leon CFM, Kelly JF, Evans DA, Whittaker JC, Oksenberg JR, De Jager PL, Vyse TJ. Fine-mapping the genetic basis of CRP regulation in African Americans: a Bayesian approach. Hum Genet 2008; 123:633-42. [PMID: 18500540 DOI: 10.1007/s00439-008-0517-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2008] [Accepted: 05/16/2008] [Indexed: 01/07/2023]
Abstract
Basal levels of C-reactive protein (CRP) have been associated with disease, particularly future cardiovascular events. Twin studies estimate 50% CRP heritability, so the identification of genetic variants influencing CRP expression is important. Existing studies in populations of European ancestry have identified numerous cis-acting variants but leave significant ambiguity over the identity of the key functional polymorphisms. We addressed this issue by typing a dense map of CRP single-nucleotide polymorphisms (SNPs), and quantifying serum CRP in 594 unrelated African Americans. We used Bayesian model choice analysis to select the combination of SNPs best explaining basal CRP and found strong support for triallelic rs3091244 alone, with the T allele acting in an additive manner (Bayes factor > 100 vs. null model), with additional support for a model incorporating both rs3091244 and rs12728740. Admixture analysis suggested SNP rs12728740 segregated with haplotypes predicted to be of recent European origin. Using a cladistic approach we confirmed the importance of rs3091244(T) by demonstrating a significant partition of haplotype effect based on the rs3091244(C/T) mutation (F = 8.91, P = 0.006). We argue that weaker linkage disequilibrium across the African American CRP locus compared with Europeans has allowed us to establish an unambiguous functional role for rs3091244(T), while also recognising the potential for additional functional mutations present in the European genome.
Collapse
Affiliation(s)
- Benjamin Rhodes
- Section of Molecular Genetics and Rheumatology, Faculty of Medicine, Imperial College, Du Cane Road, London W12 0NN, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
|
35
|
Woolley SM, Posada D, Crandall KA. A comparison of phylogenetic network methods using computer simulation. PLoS One 2008; 3:e1913. [PMID: 18398452 PMCID: PMC2275308 DOI: 10.1371/journal.pone.0001913] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2007] [Accepted: 02/24/2008] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND We present a series of simulation studies that explore the relative performance of several phylogenetic network approaches (statistical parsimony, split decomposition, union of maximum parsimony trees, neighbor-net, simulated history recombination upper bound, median-joining, reduced median joining and minimum spanning network) compared to standard tree approaches, (neighbor-joining and maximum parsimony) in the presence and absence of recombination. PRINCIPAL FINDINGS In the absence of recombination, all methods recovered the correct topology and branch lengths nearly all of the time when the substitution rate was low, except for minimum spanning networks, which did considerably worse. At a higher substitution rate, maximum parsimony and union of maximum parsimony trees were the most accurate. With recombination, the ability to infer the correct topology was halved for all methods and no method could accurately estimate branch lengths. CONCLUSIONS Our results highlight the need for more accurate phylogenetic network methods and the importance of detecting and accounting for recombination in phylogenetic studies. Furthermore, we provide useful information for choosing a network algorithm and a framework in which to evaluate improvements to existing methods and novel algorithms developed in the future.
Collapse
Affiliation(s)
- Steven M Woolley
- Computational Biology Program, Washington University School of Medicine, St. Louis, Missouri, United States of America.
| | | | | |
Collapse
|
36
|
|
37
|
Bardel C, Croiseau P, Génin E. Dealing with missing phase and missing data in phylogeny-based analysis. BMC Proc 2007; 1 Suppl 1:S22. [PMID: 18466519 PMCID: PMC2367603 DOI: 10.1186/1753-6561-1-s1-s22] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
We recently described a new method to identify disease susceptibility loci, based on the analysis of the evolutionary relationships between haplotypes of cases and controls. However, haplotypes are often unknown and the problem of phase inference is even more crucial when there are missing data. In this work, we suggest using a multiple imputation algorithm to deal with missing phase and missing data, prior to a phylogeny-based analysis. We used the simulated data of Genetic Analysis Workshop 15 (Problem 3, answer known) to assess the power of the phylogeny-based analysis to detect disease susceptibility loci after reconstruction of haplotypes by a multiple-imputation method. We compare, for various rates of missing data, the performance of the multiple imputation method with the performance achieved when considering only the most probable haplotypic configurations or the true phase. When only the phase is unknown, all methods perform approximately the same to identify disease susceptibility sites. In the presence of missing data however, the detection of disease susceptibility sites is significantly better when reconstructing haplotypes by multiple imputation than when considering only the best haplotype configurations.
Collapse
Affiliation(s)
- Claire Bardel
- UMR 5145 - Génétique des Populations Humaines - CNRS MNH, Université Paris VII, 17 Place du Trocadero, Paris, 75016 France.
| | | | | |
Collapse
|
38
|
Branicki W, Brudnik U, Kupiec T, Wolańska-Nowak P, Szczerbińska A, Wojas-Pelc A. Association of polymorphic sites in the OCA2 gene with eye colour using the tree scanning method. Ann Hum Genet 2007; 72:184-92. [PMID: 18093281 DOI: 10.1111/j.1469-1809.2007.00407.x] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
A number of genes are considered to affect normal variation in human pigmentation. Recent studies have indicated that OCA2 is the crucial gene involved in the high variation of iris colour present among populations of European descent. In this study, eleven polymorphisms of the OCA2 gene were examined in search of their association with different pigment traits. The evolutionary tree scanning method indicated that the strongest phenotypic eye colour variation is associated with the branch defined by nonsynonymous change rs1800407, which refers to amino acid causing change Arg419Gln located in exon 13. Single SNP analysis indicated that allele 419Gln is associated with green/hazel iris colour (p < 0.001). According to tree scanning analysis, the proportion of eye colour variation explained by this nucleotide position is merely 4%. Thus, additional variation present in the OCA2 gene and perhaps some other pigment related genes must be taken into account in order to explain the high phenotypic variation in iris colour.
Collapse
Affiliation(s)
- W Branicki
- Institute of Forensic Research, Section of Forensic Genetics, Westerplatte 9, Krakow, Poland.
| | | | | | | | | | | |
Collapse
|
39
|
Abstract
Given the increasing size of modern genetic data sets and, in particular, the move towards genome-wide studies, there is merit in considering analyses that gain computational efficiency by being more heuristic in nature. With this in mind, we present results of cladistic analyses methods on the Genetic Analysis Workshop 15 Problem 3 simulated data (answers known). Our analysis attempts to capture similarities between individuals using a series of trees, and then looks for regions in which mutations on those trees can successfully explain a phenotype of interest. Existing varieties of such algorithms assume haplotypes are known, or have been inferred, an assumption that is often unrealistic for genome-wide data. We therefore present an extension of these methods that can successfully analyze genotype, rather than haplotype, data.
Collapse
Affiliation(s)
- Hsuan Jung
- Keck School of Medicine, Preventive Medicine, University of Southern California, 1540 Alcazar Street, CHP-220, Los Angeles, California 90089-9011, USA.
| | | | | |
Collapse
|
40
|
Platt A. Association mapping through heuristic evolutionary history reconstruction-application to GAW15 Problem 3. BMC Proc 2007; 1 Suppl 1:S131. [PMID: 18466474 PMCID: PMC2367498 DOI: 10.1186/1753-6561-1-s1-s131] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
This paper presents a novel method of identifying phenotypically important regions of the genome. It involves a form of association mapping that works by summarizing properties of the ancestral recombination graph (ARG) of a sample of unrelated phenotyped and genotyped individuals. By breaking the sample into many small sub-samples and averaging the results, it becomes computationally tractable to measure the degree to which the evolutionary history of any locus is consistent with the distribution of the phenotypes in the sample. Analysis of simulated rheumatoid arthritis data demonstrates the efficiency and effectiveness of this method in identifying loci of large phenotypic effect.
Collapse
Affiliation(s)
- Alexander Platt
- Department of Molecular and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, California 90089-2910, USA.
| |
Collapse
|
41
|
Fell JW, Scorzetti G, Statzell-Tallman A, Boundy-Mills K. Molecular diversity and intragenomic variability in the yeast genusXanthophyllomyces:the origin ofPhaffia rhodozyma? FEMS Yeast Res 2007; 7:1399-408. [PMID: 17825066 DOI: 10.1111/j.1567-1364.2007.00297.x] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
The teleomorphic basidiomycetous yeast Xanthophyllomyces dendrorhous is important as a commercial source of astaxanthin, which is a component of feeds for mariculture. Phaffia rhodozyma is the anamorphic state of Xanthophyllomyces; however, there are conflicting reports in the literature concerning the presence of a sexual cycle in P. rhodozyma. The current study attempted to explain this enigma. Strains were obtained from the Phaff Yeast Culture Collection (University of California, Davis) and other sources in the northern hemisphere. Molecular sequences of three nuclear rDNA regions were examined: the internal transcribed spacers (ITS), intergenic spacer (IGS1) and the D1D2 region at the 5' end of the 26S gene. Different levels of genetic variability were observed in the three regions. The D1D2 differentiated major groups of strains, while an increased variability in the ITS suggested that the ITS region could be employed as an ecological marker. The greatest variability was in the IGS1 region, where strains can be defined by the presence and location of indels. Intragenomic sequence heterogeneity in the ITS and IGS1 regions led to the hypothesis that the type strain of P. rhodozyma (CBS 5905(T), UCD 67-210(T)) was derived as a mating-deficient basidiospore from the parent teleomorphic strain CBS 9090.
Collapse
MESH Headings
- Base Sequence
- Basidiomycota/genetics
- California
- Cluster Analysis
- DNA, Fungal/chemistry
- DNA, Fungal/genetics
- DNA, Ribosomal/chemistry
- DNA, Ribosomal/genetics
- DNA, Ribosomal Spacer/chemistry
- DNA, Ribosomal Spacer/genetics
- Evolution, Molecular
- Molecular Sequence Data
- Phylogeny
- Polymorphism, Genetic
- RNA, Ribosomal/genetics
- Sequence Analysis, DNA
- Sequence Homology, Nucleic Acid
Collapse
Affiliation(s)
- Jack W Fell
- Rosenstiel School of Marine and Atmospheric Science, University of Miami, Key Biscayne, FL, USA.
| | | | | | | |
Collapse
|
42
|
Bergen AW, Baccarelli A, McDaniel TK, Kuhn K, Pfeiffer R, Kakol J, Bender P, Jacobs K, Packer B, Chanock SJ, Yeager M. Cis sequence effects on gene expression. BMC Genomics 2007; 8:296. [PMID: 17727713 PMCID: PMC2077339 DOI: 10.1186/1471-2164-8-296] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2007] [Accepted: 08/29/2007] [Indexed: 11/10/2022] Open
Abstract
Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics) provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects) in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning) to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p < 0.05) association with gene expression. Using the literature as a "gold standard" to compare 14 genes with data from both this study and the literature, we observed 80% and 85% concordance for genes exhibiting and not exhibiting significant cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.
Collapse
Affiliation(s)
- Andrew W Bergen
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA
- Center for Health Sciences, Policy Division, SRI International, Menlo Park, CA USA
| | - Andrea Baccarelli
- School of Public Health, Harvard University, Boston, MA USA
- Molecular Epidemiology and Genetics, EPOCA Epidemiology Center, Maggiore Hospital, Mangiagalli and Regina Elena IRCCS Foundation & University of Milan, Milan, Italy
| | | | | | - Ruth Pfeiffer
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA
| | | | | | - Kevin Jacobs
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA
- Core Genotyping Facility, National Cancer Institute, Gaithersburg, MD USA
| | - Bernice Packer
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA
- Core Genotyping Facility, National Cancer Institute, Gaithersburg, MD USA
- Science Applications International Corporation-National Cancer Institute (NCI), NCI-FCRDC, Frederick, MD USA
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA
- Core Genotyping Facility, National Cancer Institute, Gaithersburg, MD USA
| | - Meredith Yeager
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA
- Core Genotyping Facility, National Cancer Institute, Gaithersburg, MD USA
- Science Applications International Corporation-National Cancer Institute (NCI), NCI-FCRDC, Frederick, MD USA
| |
Collapse
|
43
|
Skøt L, Humphreys J, Humphreys MO, Thorogood D, Gallagher J, Sanderson R, Armstead IP, Thomas ID. Association of candidate genes with flowering time and water-soluble carbohydrate content in Lolium perenne (L.). Genetics 2007; 177:535-47. [PMID: 17660575 PMCID: PMC2013705 DOI: 10.1534/genetics.107.071522] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We describe a candidate gene approach for associating SNPs with variation in flowering time and water-soluble carbohydrate (WSC) content and other quality traits in the temperate forage grass species Lolium perenne. Three analysis methods were used, which took the significant population structure into account. First, a linear mixed model was used enabling a structured association analysis to be incorporated with the nine populations identified in the structure analysis as random variables. Second, a within-population analysis of variance was performed. Third, a tree-scanning method was used, in which haplotype trees were associated with phenotypes on the basis of inferred haplotypes. Analysis of variance within populations identified several associations between WSC, nitrogen (N), and dry matter digestibility with allelic variants within an alkaline invertase candidate gene LpcAI. These associations were only detected in material harvested in one of the two years. By contrast, consistent associations between the L. perenne homolog (LpHD1) of the rice photoperiod control gene HD1 and flowering time were identified. One SNP, in the immediate upstream region of the LpHD1 coding sequence (C-4443-A), was significant in the linear mixed model. Within-population analysis of variance and tree-scanning analysis confirmed and extended this result to the 2118 polymorphisms in some of the populations. The merits of the tree-scanning method are compared to the single SNP analysis. The potential usefulness of the 4443 SNP in marker-assisted selection is currently being evaluated in test crosses of genotypes from this work with turf-grass varieties.
Collapse
Affiliation(s)
- Leif Skøt
- Institute of Grassland and Environmental Research, Plant Genetics and Breeding Department, Aberystwyth, Ceredigion SY23 3EB, United Kingdom.
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Nowotny P, Simcock X, Bertelsen S, Hinrichs AL, Kauwe JSK, Mayo K, Smemo S, Morris JC, Goate A. Association studies testing for risk for late-onset Alzheimer's disease with common variants in the beta-amyloid precursor protein (APP). Am J Med Genet B Neuropsychiatr Genet 2007; 144B:469-74. [PMID: 17427190 DOI: 10.1002/ajmg.b.30485] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Linkage studies have suggested a susceptibility locus for late-onset Alzheimer's disease (LOAD) on chromosome 21. A functional candidate gene in this region is the beta-amyloid precursor protein (APP) gene. Previously, coding mutations in APP have been associated with early onset Alzheimer's Disease (EOAD). Three copies of APP are associated with AD pathology in Down's syndrome and in EOAD, suggesting that overexpression of APP may be a risk factor for LOAD. Although APP is a strong functional and positional candidate, to date there has been no thorough investigation using a dense map of SNPs across the APP gene. In order to investigate the role of common variation in the APP gene in the risk of LOAD, we genotyped 44 SNPs, spanning 300 kb spanning the entire gene, in a large case-control series of 738 AD cases and 657 healthy controls. The SNPs showed no association in genotypic or allelic tests, even after stratification for presence or absence of the APOE 4 allele. Haplotype analysis also failed to reveal significant association with any common haplotypes. These results suggest that common variation in the APP gene is not a significant risk factor for LOAD. However, we cannot rule out the possibility that multiple rare variants that increase APP expression or Abeta production might influence the risk for LOAD.
Collapse
Affiliation(s)
- Petra Nowotny
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
Nested clade phylogeographic analysis (NCPA) is a popular method for reconstructing the demographic history of spatially distributed populations from genetic data. Although some parts of the analysis are automated, there is no unique and widely followed algorithm for doing this in its entirety, beginning with the data, and ending with the inferences drawn from the data. This article describes a method that automates NCPA, thereby providing a framework for replicating analyses in an objective way. To do so, a number of decisions need to be made so that the automated implementation is representative of previous analyses. We review how the NCPA procedure has evolved since its inception and conclude that there is scope for some variability in the manual application of NCPA. We apply the automated software to three published datasets previously analyzed manually and replicate many details of the manual analyses, suggesting that the current algorithm is representative of how a typical user will perform NCPA. We simulate a large number of replicate datasets for geographically distributed, but entirely random-mating, populations. These are then analyzed using the automated NCPA algorithm. Results indicate that NCPA tends to give a high frequency of false positives. In our simulations we observe that 14% of the clades give a conclusive inference that a demographic event has occurred, and that 75% of the datasets have at least one clade that gives such an inference. This is mainly due to the generation of multiple statistics per clade, of which only one is required to be significant to apply the inference key. We survey the inferences that have been made in recent publications and show that the most commonly inferred processes (restricted gene flow with isolation by distance and contiguous range expansion) are those that are commonly inferred in our simulations. However, published datasets typically yield a richer set of inferences with NCPA than obtained in our random-mating simulations, and further testing of NCPA with models of structured populations is necessary to examine its accuracy.
Collapse
Affiliation(s)
- Mahesh Panchal
- School of Biological Sciences, University of Reading, Whiteknights, Reading, UK.
| | | |
Collapse
|
46
|
Yu CE, Seltman H, Peskind ER, Galloway N, Zhou PX, Rosenthal E, Wijsman EM, Tsuang DW, Devlin B, Schellenberg GD. Comprehensive analysis of APOE and selected proximate markers for late-onset Alzheimer's disease: patterns of linkage disequilibrium and disease/marker association. Genomics 2007; 89:655-65. [PMID: 17434289 PMCID: PMC1978251 DOI: 10.1016/j.ygeno.2007.02.002] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2006] [Revised: 02/14/2007] [Accepted: 02/15/2007] [Indexed: 10/23/2022]
Abstract
The epsilon(4) allele of APOE confers a two- to fourfold increased risk for late-onset Alzheimer's disease (LOAD), but LOAD pathology does not all fit neatly around APOE. It is conceivable that genetic variation proximate to APOE contributes to LOAD risk. Therefore, we investigated the degree of linkage disequilibrium (LD) for a comprehensive set of 50 SNPs in and surrounding APOE using a substantial Caucasian sample of 1100 chromosomes. SNPs in APOE were further molecularly haplotyped to determine their phases. One set of SNPs in TOMM40, roughly 15 kb upstream of APOE, showed intriguing LD with the epsilon(4) allele and was strongly associated with the risk for developing LOAD. However, when all the SNPs were entered into a logit model, only the effect of APOE epsilon(4) remained significant. These observations diminish the possibility that loci in the TOMM40 gene may have a major effect on the risk for LOAD in Caucasians.
Collapse
Affiliation(s)
- Chang-En Yu
- Geriatric Research, Education, and Clinical Center, Veterans Affairs Puget Sound Health Care System, Seattle, WA 98108, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Israel Journal of Ecology & Evolution: Guidelines for Contributors—2007. Isr J Ecol Evol 2007. [DOI: 10.1560/ijee_53_1_117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
48
|
Abstract
Although genetic association studies have been with us for many years, even for the simplest analyses there is little consensus on the most appropriate statistical procedures. Here I give an overview of statistical approaches to population association studies, including preliminary analyses (Hardy-Weinberg equilibrium testing, inference of phase and missing data, and SNP tagging), and single-SNP and multipoint tests for association. My goal is to outline the key methods with a brief discussion of problems (population structure and multiple testing), avenues for solutions and some ongoing developments.
Collapse
Affiliation(s)
- David J Balding
- Department of Epidemiology and Public Health, Imperial College, St Marys Campus, Norfolk Place, London W2 1PG, UK.
| |
Collapse
|
49
|
Minichiello MJ, Durbin R. Mapping trait loci by use of inferred ancestral recombination graphs. Am J Hum Genet 2006; 79:910-22. [PMID: 17033967 PMCID: PMC1698562 DOI: 10.1086/508901] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2006] [Accepted: 09/01/2006] [Indexed: 12/26/2022] Open
Abstract
Large-scale association studies are being undertaken with the hope of uncovering the genetic determinants of complex disease. We describe a computationally efficient method for inferring genealogies from population genotype data and show how these genealogies can be used to fine map disease loci and interpret association signals. These genealogies take the form of the ancestral recombination graph (ARG). The ARG defines a genealogical tree for each locus, and, as one moves along the chromosome, the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to our analysis. First, we infer plausible ARGs, using a heuristic algorithm, which can handle unphased and missing data and is fast enough to be applied to large-scale studies. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch, suggesting that a causative mutation occurred on that branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs. We have characterized the performance of our method across a wide range of simulated disease models. Compared with simpler tests, our method gives increased accuracy in positioning untyped causative loci and can also be used to estimate the frequencies of untyped causative alleles. We have applied our method to Ueda et al.'s association study of CTLA4 and Graves disease, showing how it can be used to dissect the association signal, giving potentially interesting results of allelic heterogeneity and interaction. Similar approaches analyzing an ensemble of ARGs inferred using our method may be applicable to many other problems of inference from population genotype data.
Collapse
Affiliation(s)
- Mark J Minichiello
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, United Kingdom
| | | |
Collapse
|
50
|
Alexander HJ, Taylor JS, Wu SST, Breden F. PARALLEL EVOLUTION AND VICARIANCE IN THE GUPPY (POECILIA RETICULATA) OVER MULTIPLE SPATIAL AND TEMPORAL SCALES. Evolution 2006. [DOI: 10.1111/j.0014-3820.2006.tb01870.x] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|