151
|
Meng S, He J, Zhao T, Xing G, Li Y, Yang S, Lu J, Wang Y, Gai J. Detecting the QTL-allele system of seed isoflavone content in Chinese soybean landrace population for optimal cross design and gene system exploration. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2016; 129:1557-76. [PMID: 27189002 DOI: 10.1007/s00122-016-2724-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2015] [Accepted: 04/28/2016] [Indexed: 05/20/2023]
Abstract
KEY MESSAGE Utilizing an innovative GWAS in CSLRP, 44 QTL 199 alleles with 72.2 % contribution to SIFC variation were detected and organized into a QTL-allele matrix for cross design and gene annotation. The seed isoflavone content (SIFC) of soybeans is of great importance to health care. The Chinese soybean landrace population (CSLRP) as a genetic reservoir was studied for its whole-genome quantitative trait loci (QTL) system of the SIFC using an innovative restricted two-stage multi-locus genome-wide association study procedure (RTM-GWAS). A sample of 366 landraces was tested under four environments and sequenced using RAD-seq (restriction-site-associated DNA sequencing) technique to obtain 116,769 single nucleotide polymorphisms (SNPs) then organized into 29,119 SNP linkage disequilibrium blocks (SNPLDBs) for GWAS. The detected 44 QTL 199 alleles on 16 chromosomes (explaining 72.2 % of the total phenotypic variation) with the allele effects (92 positive and 107 negative) of the CSLRP were organized into a QTL-allele matrix showing the SIFC population genetic structure. Additional differentiation among eco-regions due to the SIFC in addition to that of genome-wide markers was found. All accessions comprised both positive and negative alleles, implying a great potential for recombination within the population. The optimal crosses were predicted from the matrices, showing transgressive potentials in the CSLRP. From the detected QTL system, 55 candidate genes related to 11 biological processes were χ (2)-tested as an SIFC candidate gene system. The present study explored the genome-wide SIFC QTL/gene system with the innovative RTM-GWAS and found the potentials of the QTL-allele matrix in optimal cross design and population genetic and genomic studies, which may have provided a solution to match the breeding by design strategy at both QTL and gene levels in breeding programs.
Collapse
Affiliation(s)
- Shan Meng
- Soybean Research Institute, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
| | - Jianbo He
- Soybean Research Institute, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
| | - Tuanjie Zhao
- Soybean Research Institute, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
- Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
- National Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Guangnan Xing
- Soybean Research Institute, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
- Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
- National Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Yan Li
- Soybean Research Institute, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
- Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
- National Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Shouping Yang
- Soybean Research Institute, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
- Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
- National Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Jiangjie Lu
- Soybean Research Institute, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
| | - Yufeng Wang
- Soybean Research Institute, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
- National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China
| | - Junyi Gai
- Soybean Research Institute, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China.
- National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China.
- Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture, Nanjing, 210095, Jiangsu, China.
- National Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China.
- Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China.
| |
Collapse
|
152
|
Bayesian Inference of the Evolution of a Phenotype Distribution on a Phylogenetic Tree. Genetics 2016; 204:89-98. [PMID: 27412711 PMCID: PMC5012407 DOI: 10.1534/genetics.116.190496] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 07/07/2016] [Indexed: 12/21/2022] Open
Abstract
The distribution of a phenotype on a phylogenetic tree is often a quantity of interest. Many phenotypes have imperfect heritability, so that a measurement of the phenotype for an individual can be thought of as a single realization from the phenotype distribution of that individual. If all individuals in a phylogeny had the same phenotype distribution, measured phenotypes would be randomly distributed on the tree leaves. This is, however, often not the case, implying that the phenotype distribution evolves over time. Here we propose a new model based on this principle of evolving phenotype distribution on the branches of a phylogeny, which is different from ancestral state reconstruction where the phenotype itself is assumed to evolve. We develop an efficient Bayesian inference method to estimate the parameters of our model and to test the evidence for changes in the phenotype distribution. We use multiple simulated data sets to show that our algorithm has good sensitivity and specificity properties. Since our method identifies branches on the tree on which the phenotype distribution has changed, it is able to break down a tree into components for which this distribution is unique and constant. We present two applications of our method, one investigating the association between HIV genetic variation and human leukocyte antigen and the other studying host range distribution in a lineage of Salmonella enterica, and we discuss many other potential applications.
Collapse
|
153
|
Simmons S, Sahinalp C, Berger B. Enabling Privacy-Preserving GWASs in Heterogeneous Human Populations. Cell Syst 2016; 3:54-61. [PMID: 27453444 PMCID: PMC4994706 DOI: 10.1016/j.cels.2016.04.013] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Revised: 04/08/2016] [Accepted: 04/17/2016] [Indexed: 11/26/2022]
Abstract
The proliferation of large genomic databases offers the potential to perform increasingly larger-scale genome-wide association studies (GWASs). Due to privacy concerns, however, access to these data is limited, greatly reducing their usefulness for research. Here, we introduce a computational framework for performing GWASs that adapts principles of differential privacy-a cryptographic theory that facilitates secure analysis of sensitive data-to both protect private phenotype information (e.g., disease status) and correct for population stratification. This framework enables us to produce privacy-preserving GWAS results based on EIGENSTRAT and linear mixed model (LMM)-based statistics, both of which correct for population stratification. We test our differentially private statistics, PrivSTRAT and PrivLMM, on simulated and real GWAS datasets and find they are able to protect privacy while returning meaningful results. Our framework can be used to securely query private genomic datasets to discover which specific genomic alterations may be associated with a disease, thus increasing the availability of these valuable datasets.
Collapse
Affiliation(s)
- Sean Simmons
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Cenk Sahinalp
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada; School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
154
|
Busby GB, Band G, Si Le Q, Jallow M, Bougama E, Mangano VD, Amenga-Etego LN, Enimil A, Apinjoh T, Ndila CM, Manjurano A, Nyirongo V, Doumba O, Rockett KA, Kwiatkowski DP, Spencer CC. Admixture into and within sub-Saharan Africa. eLife 2016; 5. [PMID: 27324836 PMCID: PMC4915815 DOI: 10.7554/elife.15266] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Accepted: 05/17/2016] [Indexed: 12/27/2022] Open
Abstract
Similarity between two individuals in the combination of genetic markers along their chromosomes indicates shared ancestry and can be used to identify historical connections between different population groups due to admixture. We use a genome-wide, haplotype-based, analysis to characterise the structure of genetic diversity and gene-flow in a collection of 48 sub-Saharan African groups. We show that coastal populations experienced an influx of Eurasian haplotypes over the last 7000 years, and that Eastern and Southern Niger-Congo speaking groups share ancestry with Central West Africans as a result of recent population expansions. In fact, most sub-Saharan populations share ancestry with groups from outside of their current geographic region as a result of gene-flow within the last 4000 years. Our in-depth analysis provides insight into haplotype sharing across different ethno-linguistic groups and the recent movement of alleles into new environments, both of which are relevant to studies of genetic epidemiology. DOI:http://dx.doi.org/10.7554/eLife.15266.001 Our genomes contain a record of historical events. This is because when groups of people are separated for generations, the DNA sequence in the two groups’ genomes will change in different ways. Looking at the differences in the genomes of people from the same population can help researchers to understand and reconstruct the historical interactions that brought their ancestors together. The mixing of two populations that were previously separate is known as admixture. Africa as a continent has few written records of its history. This means that it is somewhat unknown which important movements of people in the past generated the populations found in modern-day Africa. Busby et al. have now attempted to use DNA to look into this and reconstruct the last 4000 years of genetic history in African populations. As has been shown in other regions of the world, the new analysis showed that all African populations are the result of historical admixture events. However, Busby et al. could characterize these events to unprecedented level of detail. For example, multiple ethnic groups from The Gambia and Mali all show signs of sharing the same set of ancestors from West Africa, Europe and Asia who mixed around 2000 years ago. Evidence of a migration of people from Central West Africa, known as the Bantu expansion, could also be detected, and was shown to carry genes to the south and east. An important next step will be to now look at the consequences of the observed gene-flow, and ask if it has contributed to spreading beneficial, or detrimental, mutations around Africa. DOI:http://dx.doi.org/10.7554/eLife.15266.002
Collapse
Affiliation(s)
- George Bj Busby
- Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom
| | - Gavin Band
- Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom.,Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Quang Si Le
- Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom
| | - Muminatou Jallow
- Medical Research Council Unit, Serrekunda, The Gambia.,Royal Victoria Teaching Hospital, Banjul, The Gambia
| | - Edith Bougama
- Centre National de Recherche et de Formation sur le Paludisme, Ouagadougou, Burkina Faso
| | - Valentina D Mangano
- Dipartimento di Sanita Publica e Malattie Infettive, University of Rome La Sapienza, Rome, Italy
| | | | | | - Tobias Apinjoh
- Department of Biochemistry and Molecular Biology, University of Buea, Buea, Cameroon
| | | | - Alphaxard Manjurano
- Joint Malaria Programme, Kilimanjaro Christian Medical College, Moshi, Tanzania.,Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Vysaul Nyirongo
- Malawi-Liverpool Wellcome Trust Clinical Research Programme, College of Medicine, University of Malawi, Blantyre, Malawi
| | - Ogobara Doumba
- Malaria Research and Training Centre, University of Bamako, Bamako, Mali
| | - Kirk A Rockett
- Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom.,Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Dominic P Kwiatkowski
- Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom.,Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Chris Ca Spencer
- Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom
| | | |
Collapse
|
155
|
Wei Y, Wu H. Measuring the spatial correlations of protein binding sites. Bioinformatics 2016; 32:1766-72. [PMID: 26861822 DOI: 10.1093/bioinformatics/btw058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2015] [Accepted: 01/25/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Understanding the interactions of different DNA binding proteins is a crucial first step toward deciphering gene regulatory mechanism. With advances of high-throughput sequencing technology such as ChIP-seq, the genome-wide binding sites of many proteins have been profiled under different biological contexts. It is of great interest to quantify the spatial correlations of the binding sites, such as their overlaps, to provide information for the interactions of proteins. Analyses of the overlapping patterns of binding sites have been widely performed, mostly based on ad hoc methods. Due to the heterogeneity and the tremendous size of the genome, such methods often lead to biased even erroneous results. RESULTS In this work, we discover a Simpson's paradox phenomenon in assessing the genome-wide spatial correlation of protein binding sites. Leveraging information from publicly available data, we propose a testing procedure for evaluating the significance of overlapping from a pair of proteins, which accounts for background artifacts and genome heterogeneity. Real data analyses demonstrate that the proposed method provide more biologically meaningful results. AVAILABILITY AND IMPLEMENTATION An R package is available at http://www.sta.cuhk.edu.hk/YWei/ChIPCor.html CONTACTS ywei@sta.cuhk.edu.hk or hao.wu@emory.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yingying Wei
- Department of Statistics, The Chinese University of Hong Kong, Shatin, NT, Hong Kong and
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
156
|
Power considerations for λ inflation factor in meta-analyses of genome-wide association studies. Genet Res (Camb) 2016; 98:e9. [PMID: 27193946 DOI: 10.1017/s0016672316000069] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The genomic control (GC) approach is extensively used to effectively control false positive signals due to population stratification in genome-wide association studies (GWAS). However, GC affects the statistical power of GWAS. The loss of power depends on the magnitude of the inflation factor (λ) that is used for GC. We simulated meta-analyses of different GWAS. Minor allele frequency (MAF) ranged from 0·001 to 0·5 and λ was sampled from two scenarios: (i) random scenario (empirically-derived distribution of real λ values) and (ii) selected scenario from simulation parameter modification. Adjustment for λ was considered under single correction (within study corrected standard errors) and double correction (additional λ corrected summary estimate). MAF was a pivotal determinant of observed power. In random λ scenario, double correction induced a symmetric power reduction in comparison to single correction. For MAF 1·2 and MAF >5%. Our results provide a quick but detailed index for power considerations of future meta-analyses of GWAS that enables a more flexible design from early steps based on the number of studies accumulated in different groups and the λ values observed in the single studies.
Collapse
|
157
|
Vahia MN, Ladiwala U, Mahathe P, Mathur D. Population Dynamics of Early Human Migration in Britain. PLoS One 2016; 11:e0154641. [PMID: 27148959 PMCID: PMC4858239 DOI: 10.1371/journal.pone.0154641] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 04/15/2016] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Early human migration is largely determined by geography and human needs. These are both deterministic parameters when small populations move into unoccupied areas where conflicts and large group dynamics are not important. The early period of human migration into the British Isles provides such a laboratory which, because of its relative geographical isolation, may allow some insights into the complex dynamics of early human migration and interaction. METHOD AND RESULTS We developed a simulation code based on human affinity to habitable land, as defined by availability of water sources, altitude, and flatness of land, in choosing the path of migration. Movement of people on the British island over the prehistoric period from their initial entry points was simulated on the basis of data from the megalithic period. Topographical and hydro-shed data from satellite databases was used to define habitability, based on distance from water bodies, flatness of the terrain, and altitude above sea level. We simulated population movement based on assumptions of affinity for more habitable places, with the rate of movement tempered by existing populations. We compared results of our computer simulations with genetic data and show that our simulation can predict fairly accurately the points of contacts between different migratory paths. Such comparison also provides more detailed information about the path of peoples' movement over ~2000 years before the present era. CONCLUSIONS We demonstrate an accurate method to simulate prehistoric movements of people based upon current topographical satellite data. Our findings are validated by recently-available genetic data. Our method may prove useful in determining early human population dynamics even when no genetic information is available.
Collapse
Affiliation(s)
- Mayank N. Vahia
- Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400005, India
| | - Uma Ladiwala
- UM-DAE Centre for Excellence in Basic Sciences, University of Mumbai, Kalina, Mumbai 400098, India
| | - Pavan Mahathe
- Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400005, India
| | - Deepak Mathur
- Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400005, India
| |
Collapse
|
158
|
Kelleher J, Etheridge AM, McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comput Biol 2016; 12:e1004842. [PMID: 27145223 PMCID: PMC4856371 DOI: 10.1371/journal.pcbi.1004842] [Citation(s) in RCA: 347] [Impact Index Per Article: 43.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 03/02/2016] [Indexed: 01/23/2023] Open
Abstract
A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods. Our understanding of the distribution of genetic variation in natural populations has been driven by mathematical models of the underlying biological and demographic processes. A key strength of such coalescent models is that they enable efficient simulation of data we might see under a variety of evolutionary scenarios. However, current methods are not well suited to simulating genome-scale data sets on hundreds of thousands of samples, which is essential if we are to understand the data generated by population-scale sequencing projects. Similarly, processing the results of large simulations also presents researchers with a major challenge, as it can take many days just to read the data files. In this paper we solve these problems by introducing a new way to represent information about the ancestral process. This new representation leads to huge gains in simulation speed and storage efficiency so that large simulations complete in minutes and the output files can be processed in seconds.
Collapse
Affiliation(s)
- Jerome Kelleher
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| | | | - Gilean McVean
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
159
|
Mackinnon MJ, Ndila C, Uyoga S, Macharia A, Snow RW, Band G, Rautanen A, Rockett KA, Kwiatkowski DP, Williams TN. Environmental Correlation Analysis for Genes Associated with Protection against Malaria. Mol Biol Evol 2016; 33:1188-204. [PMID: 26744416 PMCID: PMC4839215 DOI: 10.1093/molbev/msw004] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Genome-wide searches for loci involved in human resistance to malaria are currently being conducted on a large scale in Africa using case-control studies. Here, we explore the utility of an alternative approach-"environmental correlation analysis, ECA," which tests for clines in allele frequencies across a gradient of an environmental selection pressure-to identify genes that have historically protected against death from malaria. We collected genotype data from 12,425 newborns on 57 candidate malaria resistance loci and 9,756 single nucleotide polymorphisms (SNPs) selected at random from across the genome, and examined their allele frequencies for geographic correlations with long-term malaria prevalence data based on 84,042 individuals living under different historical selection pressures from malaria in coastal Kenya. None of the 57 candidate SNPs showed significant (P < 0.05) correlations in allele frequency with local malaria transmission intensity after adjusting for population structure and multiple testing. In contrast, two of the random SNPs that had highly significant correlations (P < 0.01) were in genes previously linked to malaria resistance, namely, CDH13, encoding cadherin 13, and HS3ST3B1, encoding heparan sulfate 3-O-sulfotransferase 3B1. Both proteins play a role in glycoprotein-mediated cell-cell adhesion which has been widely implicated in cerebral malaria, the most life-threatening form of this disease. Other top genes, including CTNND2 which encodes δ-catenin, a molecular partner to cadherin, were significantly enriched in cadherin-mediated pathways affecting inflammation of the brain vascular endothelium. These results demonstrate the utility of ECA in the discovery of novel genes and pathways affecting infectious disease.
Collapse
Affiliation(s)
| | - Carolyne Ndila
- Department of Epidemiology and Demography, KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya
| | - Sophie Uyoga
- Department of Epidemiology and Demography, KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya
| | - Alex Macharia
- Department of Epidemiology and Demography, KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya
| | - Robert W. Snow
- Department of Public Health Research, KEMRI-Wellcome Trust Research Programme, Nairobi, Kenya
- Centre for Tropical Medicine, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, United Kingdom
| | - Gavin Band
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Anna Rautanen
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Kirk A. Rockett
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- The Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Dominic P. Kwiatkowski
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- The Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Thomas N. Williams
- Department of Epidemiology and Demography, KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya
- Department of Medicine, Imperial College, London, United Kingdom
- INDEPTH Network, Kanda, Accra, Ghana
| |
Collapse
|
160
|
Zeng X, Warshauer DH, King JL, Churchill JD, Chakraborty R, Budowle B. Empirical testing of a 23-AIMs panel of SNPs for ancestry
evaluations in four major US populations. Int J Legal Med 2016; 130:891-896. [DOI: 10.1007/s00414-016-1333-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 02/05/2016] [Indexed: 10/22/2022]
|
161
|
Kryvokhyzha D, Holm K, Chen J, Cornille A, Glémin S, Wright SI, Lagercrantz U, Lascoux M. The influence of population structure on gene expression and flowering time variation in the ubiquitous weedCapsella bursa-pastoris(Brassicaceae). Mol Ecol 2016; 25:1106-21. [DOI: 10.1111/mec.13537] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Revised: 11/18/2015] [Accepted: 12/10/2015] [Indexed: 12/19/2022]
Affiliation(s)
- Dmytro Kryvokhyzha
- Department of Ecology and Genetics; Evolutionary Biology Center and Science for Life Laboratory; Uppsala University; 75236 Uppsala Sweden
| | - Karl Holm
- Department of Ecology and Genetics; Evolutionary Biology Center and Science for Life Laboratory; Uppsala University; 75236 Uppsala Sweden
| | - Jun Chen
- Department of Ecology and Genetics; Evolutionary Biology Center and Science for Life Laboratory; Uppsala University; 75236 Uppsala Sweden
| | - Amandine Cornille
- Department of Ecology and Genetics; Evolutionary Biology Center and Science for Life Laboratory; Uppsala University; 75236 Uppsala Sweden
| | - Sylvain Glémin
- Department of Ecology and Genetics; Evolutionary Biology Center and Science for Life Laboratory; Uppsala University; 75236 Uppsala Sweden
- Institut des Sciences de l'Evolution (ISEM - UMR 5554 Université de Montpellier-CNRS-IRD-EPHE); Place Eugene Bataillon 34075 Montpellier France
| | - Stephen I. Wright
- Department of Ecology and Evolution; University of Toronto; 25 Willcocks St. Toronto ON M5S 3B2 Canada
| | - Ulf Lagercrantz
- Department of Ecology and Genetics; Evolutionary Biology Center and Science for Life Laboratory; Uppsala University; 75236 Uppsala Sweden
| | - Martin Lascoux
- Department of Ecology and Genetics; Evolutionary Biology Center and Science for Life Laboratory; Uppsala University; 75236 Uppsala Sweden
| |
Collapse
|
162
|
Montazeri Z, Theodoratou E, Nyiraneza C, Timofeeva M, Chen W, Svinti V, Sivakumaran S, Gresham G, Cubitt L, Carvajal-Carmona L, Bertagnolli MM, Zauber AG, Tomlinson I, Farrington SM, Dunlop MG, Campbell H, Little J. Systematic meta-analyses and field synopsis of genetic association studies in colorectal adenomas. Int J Epidemiol 2016; 45:186-205. [PMID: 26451011 PMCID: PMC5860727 DOI: 10.1093/ije/dyv185] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/20/2015] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Low penetrance genetic variants, primarily single nucleotide polymorphisms, have substantial influence on colorectal cancer (CRC) susceptibility. Most CRCs develop from colorectal adenomas (CRA). Here we report the first comprehensive field synopsis that catalogues all genetic association studies on CRA, with a parallel online database [http://www.chs.med.ed.ac.uk/CRAgene/]. METHODS We performed a systematic review, reviewing 9750 titles, and then extracted data from 130 publications reporting on 181 polymorphisms in 74 genes. We conducted meta-analyses to derive summary effect estimates for 37 polymorphisms in 26 genes. We applied the Venice criteria and Bayesian False Discovery Probability (BFDP) to assess the levels of the credibility of associations. RESULTS We considered the association with the rs6983267 variant at 8q24 as 'highly credible', reaching genome-wide statistical significance in at least one meta-analysis model. We identified 'less credible' associations (higher heterogeneity, lower statistical power, BFDP > 0.02) with a further four variants of four independent genes: MTHFR c.677C>T p.A222V (rs1801133), TP53 c.215C>G p.R72P (rs1042522), NQO1 c.559C>T p.P187S (rs1800566), and NAT1 alleles imputed as fast acetylator genotypes. For the remaining 32 variants of 22 genes for which positive associations with CRA risk have been previously reported, the meta-analyses revealed no credible evidence to support these as true associations. CONCLUSIONS The limited number of credible associations between low penetrance genetic variants and CRA reflects the lower volume of evidence and associated lack of statistical power to detect associations of the magnitude typically observed for genetic variants and chronic diseases. The CRA gene database provides context for CRA genetic association data and will help inform future research directions.
Collapse
Affiliation(s)
- Zahra Montazeri
- School of Epidemiology, Public Health and Preventive Medicine, University of Ottawa, Ottawa, Canada
| | - Evropi Theodoratou
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
| | - Christine Nyiraneza
- School of Epidemiology, Public Health and Preventive Medicine, University of Ottawa, Ottawa, Canada
| | - Maria Timofeeva
- Colon Cancer Genetics Group and Academic Coloproctology, Institute of Genetics and Molecular Medicine, University of Edinburgh and MRC Human Genetics Unit Western General Hospital, Edinburgh, UK
| | - Wanjing Chen
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
| | - Victoria Svinti
- Colon Cancer Genetics Group and Academic Coloproctology, Institute of Genetics and Molecular Medicine, University of Edinburgh and MRC Human Genetics Unit Western General Hospital, Edinburgh, UK
| | - Shanya Sivakumaran
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
| | - Gillian Gresham
- School of Epidemiology, Public Health and Preventive Medicine, University of Ottawa, Ottawa, Canada
| | - Laura Cubitt
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
| | - Luis Carvajal-Carmona
- Biochemistry and Molecular Medicine, Genome and Biomedical Sciences Facility, UC Davis School of Medicine, University of California Davis, Davis, CA, USA
| | | | - Ann G Zauber
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY, USA and
| | - Ian Tomlinson
- Wellcome Trust Centre for Human Genetics, Oxford, UK
| | - Susan M Farrington
- Colon Cancer Genetics Group and Academic Coloproctology, Institute of Genetics and Molecular Medicine, University of Edinburgh and MRC Human Genetics Unit Western General Hospital, Edinburgh, UK
| | - Malcolm G Dunlop
- Colon Cancer Genetics Group and Academic Coloproctology, Institute of Genetics and Molecular Medicine, University of Edinburgh and MRC Human Genetics Unit Western General Hospital, Edinburgh, UK
| | - Harry Campbell
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK, Colon Cancer Genetics Group and Academic Coloproctology, Institute of Genetics and Molecular Medicine, University of Edinburgh and MRC Human Genetics Unit Western General Hospital, Edinburgh, UK
| | - Julian Little
- School of Epidemiology, Public Health and Preventive Medicine, University of Ottawa, Ottawa, Canada,
| |
Collapse
|
163
|
Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies. PLoS Genet 2016; 12:e1005767. [PMID: 26828793 PMCID: PMC4734661 DOI: 10.1371/journal.pgen.1005767] [Citation(s) in RCA: 687] [Impact Index Per Article: 85.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 12/03/2015] [Indexed: 12/05/2022] Open
Abstract
False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises true positives. The modified MLM method, Multiple Loci Linear Mixed Model (MLMM), incorporates multiple markers simultaneously as covariates in a stepwise MLM to partially remove the confounding between testing markers and kinship. To completely eliminate the confounding, we divided MLMM into two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and use them iteratively. FEM contains testing markers, one at a time, and multiple associated markers as covariates to control false positives. To avoid model over-fitting problem in FEM, the associated markers are estimated in REM by using them to define kinship. The P values of testing markers and the associated markers are unified at each iteration. We named the new method as Fixed and random model Circulating Probability Unification (FarmCPU). Both real and simulated data analyses demonstrated that FarmCPU improves statistical power compared to current methods. Additional benefits include an efficient computing time that is linear to both number of individuals and number of markers. Now, a dataset with half million individuals and half million markers can be analyzed within three days.
Collapse
Affiliation(s)
- Xiaolei Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei, China
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
| | - Meng Huang
- Department of Crop and Soil Sciences, Washington State University, Pullman, Washington, United States of America
| | - Bin Fan
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Edward S. Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
- United States Department of Agriculture (USDA)–Agricultural Research Service (ARS), Ithaca, New York, United States of America
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, Washington, United States of America
- Department of Animal Sciences, Northeast Agricultural University, Harbin, Heilongjiang, China
| |
Collapse
|
164
|
Hejase HA, Liu KJ. Mapping the genomic architecture of adaptive traits with interspecific introgressive origin: a coalescent-based approach. BMC Genomics 2016; 17 Suppl 1:8. [PMID: 26819241 PMCID: PMC4895787 DOI: 10.1186/s12864-015-2298-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Recent studies of eukaryotes including human and Neandertal, mice, and butterflies have highlighted the major role that interspecific introgression has played in adaptive trait evolution. A common question arises in each case: what is the genomic architecture of the introgressed traits? One common approach that can be used to address this question is association mapping, which looks for genotypic markers that have significant statistical association with a trait. It is well understood that sample relatedness can be a confounding factor in association mapping studies if not properly accounted for. Introgression and other evolutionary processes (e.g., incomplete lineage sorting) typically introduce variation among local genealogies, which can also differ from global sample structure measured across all genomic loci. In contrast, state-of-the-art association mapping methods assume fixed sample relatedness across the genome, which can lead to spurious inference. We therefore propose a new association mapping method called Coal-Map, which uses coalescent-based models to capture local genealogical variation alongside global sample structure. Using simulated and empirical data reflecting a range of evolutionary scenarios, we compare the performance of Coal-Map against EIGENSTRAT, a leading association mapping method in terms of its popularity, power, and type I error control. Our empirical data makes use of hundreds of mouse genomes for which adaptive interspecific introgression has recently been described. We found that Coal-Map's performance is comparable or better than EIGENSTRAT in terms of statistical power and false positive rate. Coal-Map's performance advantage was greatest on model conditions that most closely resembled empirically observed scenarios of adaptive introgression. These conditions had: (1) causal SNPs contained in one or a few introgressed genomic loci and (2) varying rates of gene flow - from high rates to very low rates where incomplete lineage sorting dominated as a primary cause of local genealogical variation.
Collapse
Affiliation(s)
- Hussein A Hejase
- Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, East Lansing, 48824, MI, USA.
| | - Kevin J Liu
- Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, East Lansing, 48824, MI, USA.
| |
Collapse
|
165
|
Sittig LJ, Carbonetto P, Engel KA, Krauss KS, Palmer AA. Integration of genome-wide association and extant brain expression QTL identifies candidate genes influencing prepulse inhibition in inbred F1 mice. GENES BRAIN AND BEHAVIOR 2016; 15:260-70. [PMID: 26482417 DOI: 10.1111/gbb.12262] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Revised: 10/13/2015] [Accepted: 10/15/2015] [Indexed: 12/12/2022]
Abstract
Genetic association mapping in structured populations of model organisms can offer a fruitful complement to human genetic studies by generating new biological hypotheses about complex traits. Here we investigated prepulse inhibition (PPI), a measure of sensorimotor gating that is disrupted in a number of psychiatric disorders. To identify genes that influence PPI, we constructed a panel of half-sibs by crossing 30 females from common inbred mouse strains with inbred C57BL/6J males to create male and female F1 offspring. We used publicly available single nucleotide polymorphism (SNP) genotype data from these inbred strains to perform a genome-wide association scan using a dense panel of over 150,000 SNPs in a combined sample of 604 mice representing 30 distinct F1 genotypes. We identified two independent PPI-associated loci on Chromosomes 2 and 7, each of which explained 12-14% of the variance in PPI. Searches of available databases did not identify any plausible causative coding polymorphisms within these loci. However, previously collected expression quantitative trait locus (eQTL) data from hippocampus and striatum indicated that the SNPs on Chromosomes 2 and 7 that showed the strongest association with PPI were also strongly associated with expression of several transcripts, some of which have been implicated in human psychiatric disorders. This integrative approach successfully identified a focused set of genes which can be prioritized for follow-up studies. More broadly, our results show that F1 crosses among common inbred strains can be used in combination with other informatics and expression datasets to identify candidate genes for complex behavioral traits.
Collapse
Affiliation(s)
- L J Sittig
- Department of Human Genetics, University of Chicago, Chicago, IL
| | - P Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL
| | - K A Engel
- Department of Human Genetics, University of Chicago, Chicago, IL
| | - K S Krauss
- Department of Human Genetics, University of Chicago, Chicago, IL
| | - A A Palmer
- Department of Human Genetics, University of Chicago, Chicago, IL.,Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
166
|
Khan A, Tian L, Zhang C, Yuan K, Xu S. Genetic diversity and natural selection footprints of the glycine amidinotransferase gene in various human populations. Sci Rep 2016; 6:18755. [PMID: 26729229 PMCID: PMC4700420 DOI: 10.1038/srep18755] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 11/23/2015] [Indexed: 12/02/2022] Open
Abstract
The glycine amidinotransferase gene (GATM) plays a vital role in energy metabolism in muscle tissues and is associated with multiple clinically important phenotypes. However, the genetic diversity of the GATM gene remains poorly understood within and between human populations. Here we analyzed the 1,000 Genomes Project data through population genetics approaches and observed significant genetic diversity across the GATM gene among various continental human populations. We observed considerable variations in GATM allele frequencies and haplotype composition among different populations. Substantial genetic differences were observed between East Asian and European populations (FST = 0.56). In addition, the frequency of a distinct major GATM haplotype in these groups was congruent with population-wide diversity at this locus. Furthermore, we identified GATM as the top differentiated gene compared to the other statin drug response-associated genes. Composite multiple analyses identified signatures of positive selection at the GATM locus, which was estimated to have occurred around 850 generations ago in European populations. As GATM catalyzes the key step of creatine biosynthesis involved in energy metabolism, we speculate that the European prehistorical demographic transition from hunter-gatherer to farming cultures was the driving force of selection that fulfilled creatine-based metabolic requirement of the populations.
Collapse
Affiliation(s)
- Asifullah Khan
- Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese academy of Sciences, Shanghai 200031, China.,Department of Biochemistry, Abdul Wali Khan University Mardan (AWKUM), Mardan, Khyber Pakhthunkhwa, Pakistan
| | - Lei Tian
- Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese academy of Sciences, Shanghai 200031, China
| | - Chao Zhang
- Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese academy of Sciences, Shanghai 200031, China
| | - Kai Yuan
- Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese academy of Sciences, Shanghai 200031, China
| | - Shuhua Xu
- Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese academy of Sciences, Shanghai 200031, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai 200031, China.,Collaborative Innovation Center of Genetics and Development, Shanghai 200438, China
| |
Collapse
|
167
|
Zhang K, Wu Z, Tang D, Lv C, Luo K, Zhao Y, Liu X, Huang Y, Wang J. Development and Identification of SSR Markers Associated with Starch Properties and β-Carotene Content in the Storage Root of Sweet Potato (Ipomoea batatas L.). FRONTIERS IN PLANT SCIENCE 2016; 7:223. [PMID: 26973669 PMCID: PMC4773602 DOI: 10.3389/fpls.2016.00223] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Accepted: 02/10/2016] [Indexed: 05/03/2023]
Abstract
Sweet potato (Ipomoea batatas L.) is a nutritious food crop and, based on the high starch content of its storage root, a potential bioethanol feedstock. Enhancing the nutritional value and starch quantity of storage roots are important goals of sweet potato breeding programs aimed at developing improved varieties for direct consumption, processing, and industrial uses. However, developing improved lines of sweet potato is challenging due to the genetic complexity of this plant and the lack of genome information. Short sequence repeat (SSR) markers are powerful molecular tools for tracking important loci in crops and for molecular-based breeding strategies; however, few SSR markers and marker-trait associations have hitherto been identified in sweet potato. In this study, we identified 1824 SSRs by using a de novo assembly of publicly available ESTs and mRNAs in sweet potato, and designed 1476 primer pairs based on SSR-containing sequences. We mapped 214 pairs of primers in a natural population comprised of 239 germplasms, and identified 1278 alleles with an average of 5.972 alleles per locus and a major allele frequency of 0.7702. Population structure analysis revealed two subpopulations in this panel of germplasms, and phenotypic characterization demonstrated that this panel is suitable for association mapping of starch-related traits. We identified 32, 16, and 17 SSR markers associated with starch content, β-carotene content, and starch composition in the storage root, respectively, using association analysis and further evaluation of a subset of sweet potato genotypes with various characteristics. The SSR markers identified here can be used to select varieties with desired traits and to investigate the genetic mechanism underlying starch and carotenoid formation in the starchy roots of sweet potato.
Collapse
Affiliation(s)
- Kai Zhang
- College of Agronomy and Biotechnology, Southwest UniversityChongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Southwest UniversityChongqing, China
- Sweet Potato Engineering and Technology Research CenterChongqing, China
- *Correspondence: Kai Zhang
| | - Zhengdan Wu
- College of Agronomy and Biotechnology, Southwest UniversityChongqing, China
- Sweet Potato Engineering and Technology Research CenterChongqing, China
| | - Daobin Tang
- College of Agronomy and Biotechnology, Southwest UniversityChongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Southwest UniversityChongqing, China
- Sweet Potato Engineering and Technology Research CenterChongqing, China
| | - Changwen Lv
- College of Agronomy and Biotechnology, Southwest UniversityChongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Southwest UniversityChongqing, China
- Sweet Potato Engineering and Technology Research CenterChongqing, China
| | - Kai Luo
- College of Agronomy and Biotechnology, Southwest UniversityChongqing, China
- Sweet Potato Engineering and Technology Research CenterChongqing, China
| | - Yong Zhao
- College of Agronomy and Biotechnology, Southwest UniversityChongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Southwest UniversityChongqing, China
- Sweet Potato Engineering and Technology Research CenterChongqing, China
| | - Xun Liu
- College of Agronomy and Biotechnology, Southwest UniversityChongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Southwest UniversityChongqing, China
- Sweet Potato Engineering and Technology Research CenterChongqing, China
| | - Yuanxin Huang
- College of Agronomy and Biotechnology, Southwest UniversityChongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Southwest UniversityChongqing, China
- Sweet Potato Engineering and Technology Research CenterChongqing, China
| | - Jichun Wang
- College of Agronomy and Biotechnology, Southwest UniversityChongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Southwest UniversityChongqing, China
- Sweet Potato Engineering and Technology Research CenterChongqing, China
- Jichun Wang
| |
Collapse
|
168
|
Lu Q, Zhang M, Niu X, Wang S, Xu Q, Feng Y, Wang C, Deng H, Yuan X, Yu H, Wang Y, Wei X. Genetic variation and association mapping for 12 agronomic traits in indica rice. BMC Genomics 2015; 16:1067. [PMID: 26673149 PMCID: PMC4681178 DOI: 10.1186/s12864-015-2245-2] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 11/25/2015] [Indexed: 01/13/2023] Open
Abstract
Background Increasing rice (Oryza sativa L.) yield is a crucial challenge for modern agriculture. The ideal plant architecture is considered to be critical to enhance rice yield. Elite plant morphological traits should include compact plant type, short stature, few unproductive tillers, thick and sturdy stems and erect leaves. To reveal the genetic variations of important morphological traits, 523 germplasm accessions were genotyped using the Illumina custom-designed array containing 5,291 single nucleotide polymorphisms (SNPs) and phenotyped in two independent environments. Genome-wide association studies were performed to uncover the genotypic and phenotypic variations using a mixed linear model. Results In total, 126 and 172 significant loci were identified and these loci explained an average of 34.45 % and 39.09 % of the phenotypic variance in two environments, respectively, and 16 of 298 (~5.37 %) loci were detected across the two environments. For the 16 loci, 423 candidate genes were predicted in a 200-kb region (±100 kb of the peak SNP). Expression-level analyses identified four candidate genes as the most promising regulators of tiller angle. Known (NAL1 and Rc) and new significant loci showed pleiotropy and gene linkage. In addition, a long genome region covering ~1.6 Mb on chromosome 11 was identified, which may be critical for rice leaf architecture because of a high association with flag leaf length and the ratio of flag leaf length and width. The pyramid effect of the elite alleles indicated that these significant loci could be beneficial for rice plant architecture improvements in the future. Finally, 37 elite varieties were chosen as breeding donors for further rice plant architectural modifications. Conclusions This study detected multiple novel loci and candidate genes related to rice morphological traits, and the work demonstrated that genome-wide association studies are powerful strategies for uncovering the genetic variations of complex traits and identifying candidate genes in rice, even though the linkage disequilibrium decayed slowly in self-pollinating species. Future research will focus on the biological validation of the candidate genes, and elite varieties will also be of interest in genome selection and breeding by design. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2245-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qing Lu
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Mengchen Zhang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Xiaojun Niu
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China. .,College of Agricultural Sciences, Shanxi Agricultural University, Taigu, 030801, China.
| | - Shan Wang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Qun Xu
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Yue Feng
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Caihong Wang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Hongzhong Deng
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Xiaoping Yuan
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Hanyong Yu
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Yiping Wang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Xinghua Wei
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| |
Collapse
|
169
|
Zeng X, Chakraborty R, King JL, LaRue B, Moura-Neto RS, Budowle B. Selection of highly informative SNP markers for population affiliation of major US populations. Int J Legal Med 2015; 130:341-52. [DOI: 10.1007/s00414-015-1297-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 11/23/2015] [Indexed: 01/17/2023]
|
170
|
Stringer S, Cerrone KC, van den Brink W, van den Berg JF, Denys D, Kahn RS, Derks EM. A guide on gene prioritization in studies of psychiatric disorders. Int J Methods Psychiatr Res 2015; 24:245-56. [PMID: 26230968 PMCID: PMC6878611 DOI: 10.1002/mpr.1482] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Revised: 06/23/2015] [Accepted: 07/02/2015] [Indexed: 12/19/2022] Open
Abstract
There has been an increasing interest in the identification of genetic variants causing individual differences in human behavior. Psychiatrists have contributed to the genetics field by defining the most important behavioral characteristics and by studying the association between genetic variants and behavioral differences within phenotypically well-characterized samples in which detailed assessments have been collected (e.g. neuroimaging). These samples are typically limited in size and are therefore not suitable for a genome-wide association analysis. Instead, gene association studies conducted in such samples typically focus on a few genes of interest, allowing smaller sample sizes. However, the selection of high-priority genes is not always straightforward and psychiatrists will usually have a limited background in genetics. We aim to fill this gap by (i) providing a basic introduction to genetics; (ii) showing how the selection of genes of interest can be optimized by the use of two web tools: Polysearch and Gene Prospector; (iii) illustrating how statistical power analyses can be performed and discussing the importance of sufficiently powered studies. This guide can help psychiatrists with limited experience in genetics in designing genetic studies that allow identification of specific behavioral, cognitive, or neural correlates of genetic risk variants, while avoiding common pitfalls. Copyright © 2015 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Sven Stringer
- Department of Psychiatry, Academic Medical CenterUniversity of AmsterdamAmsterdamThe Netherlands
- Brain Center Rudolf MagnusUniversity Medical Center UtrechtUtrechtThe Netherlands
| | - Kim C. Cerrone
- Department of Psychiatry, Academic Medical CenterUniversity of AmsterdamAmsterdamThe Netherlands
| | - Wim van den Brink
- Department of Psychiatry, Academic Medical CenterUniversity of AmsterdamAmsterdamThe Netherlands
| | - Julia F. van den Berg
- Parnassia Psychiatric InstituteThe HagueThe Netherlands
- Department of Clinical PsychologyLeiden UniversityLeidenThe Netherlands
| | - Damiaan Denys
- Department of Psychiatry, Academic Medical CenterUniversity of AmsterdamAmsterdamThe Netherlands
| | - Rene S. Kahn
- Brain Center Rudolf MagnusUniversity Medical Center UtrechtUtrechtThe Netherlands
| | - Eske M. Derks
- Department of Psychiatry, Academic Medical CenterUniversity of AmsterdamAmsterdamThe Netherlands
| |
Collapse
|
171
|
Abstract
The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as “Prakriti”. To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p ≤ 1 × 10−5) were significantly different between Prakritis, without any confounding effect of stratification, after 106 permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India’s traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine.
Collapse
|
172
|
Kleinjan M, Engels RCME, DiFranza JR. Parental smoke exposure and the development of nicotine craving in adolescent novice smokers: the roles of DRD2, DRD4, and OPRM1 genotypes. BMC Pulm Med 2015; 15:115. [PMID: 26449981 PMCID: PMC4599744 DOI: 10.1186/s12890-015-0114-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 09/28/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Among adolescent novice smokers, craving is often the first, and is the most reported, symptom of nicotine dependence. Until now, little has been known about the development of craving symptoms in novice smokers. The aim of this study was to identify specific genetic (i.e., DRD2 Taq1A, DRD4 48 bp VNTR, and OPRM1 A118G polymorphisms) and environmental mechanisms that underlie the emergence of both cue-induced and cognitive craving among adolescent novice smokers. METHOD A five-wave longitudinal, genetically-informed survey study was conducted with intervals of four months. The sample included 376 early adolescent smokers (12-13 years of age at baseline). Self-report questionnaires were completed regarding smoking behavior, observed parental smoking behavior, and both cue-induced and cognitive craving. RESULTS Data were analyzed with a latent growth curve approach. For both cue-induced and cognitive craving, significant interaction effects were found for DRD2 Taq1A with parental smoke exposure. A1-allele carriers did not seem to be influenced by the environment with regard to craving development. Adolescents who are homozygous for the A2-allele and who are more exposed to parental smoking experience the highest levels of both types of craving over time. No significant interaction effects were found between parental smoke exposure and DRD4 48 bp VNTR or OPRM1 A118G. CONCLUSIONS Previous studies identified DRD2 Taq1A A1-allele carriers as vulnerable to developing nicotine dependence. However, this study showed that parental smoking increased the chances of developing dependence more rapidly for early adolescents who are considered to be less sensitive to the rewarding effects of nicotine according to their DRD2 Taq1A genotype. It is thus especially important that these young people not be exposed to smoking in their social environment.
Collapse
Affiliation(s)
- Marloes Kleinjan
- Trimbos Institute (Netherlands Institute of Mental Health and Addiction), Da Costakade 45, 3521 VS, Utrecht, The Netherlands.
| | - Rutger C M E Engels
- Trimbos Institute (Netherlands Institute of Mental Health and Addiction), Da Costakade 45, 3521 VS, Utrecht, The Netherlands.
- Behavioural Science Institute, Radboud University Nijmegen, Montessorilaan 3, 6525 HR, Nijmegen, The Netherlands.
| | - Joseph R DiFranza
- University of Massachusetts Medical School, 55 N Lake Ave, Worcester, MA, 01655, USA.
| |
Collapse
|
173
|
He F, Zheng Y, Huang HH, Cheng YH, Wang CY. Association between Tourette syndrome and the dopamine D3 receptor gene rs6280. Chin Med J (Engl) 2015; 128:654-8. [PMID: 25698199 PMCID: PMC4834778 DOI: 10.4103/0366-6999.151665] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Tourette syndrome (TS) is a complex, heterozygous genetic disorder. The number of molecular genetic studies have investigated several candidate genes, particularly those implicated in the dopamine system. The dopamine D3 receptor (DRD3) gene has been considered as a candidate gene in TS. There was not any report about the association study of TS and DRD3 gene in Han Chinese population. We combined a case-control genetic association analysis and nuclear pedigrees transmission disequilibrium test (TDT) analysis to investigate the association between DRD3 gene rs6280 single nucleotide polymorphisms (SNPs) and TS in a Han Chinese population. METHODS A total of 160 TS patients was diagnosed by the diagnostic criteria of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition. The DRD3 gene rs6280 SNPs were genotyped by TaqMan SNP genotyping assay technique in all subjects. We used a case-control genetic association analysis to compare the difference in genotype and allele frequencies between 160 TS patients and 90 healthy controls. At the same time, we used TDT analysis to identify the DRD3 gene rs6280 transmission disequilibrium among 101 nuclear pedigrees. RESULTS The genotype and allele frequency of DRD3 gene rs6280 SNPs had no statistical difference between control group (90) and TS group (160) (χ2 = 3.647, P = 0.161; χ2 = 0.643, P = 0.423) using Chi-squared test. At the basis of the 101 nuclear pedigrees, TDT analysis showed no transmission disequilibrium of DRD3 gene rs6280 SNPs (χ2 = 0; P = 1). CONCLUSIONS Our findings provide no evidence for an association between DRD3 gene rs6280 and TS in the Han Chinese population.
Collapse
Affiliation(s)
| | | | | | | | - Chuan-Yue Wang
- Department of Psychiatry, Beijing Anding Hospital, Capital Medical University; Center of Schizophrenia, Beijing Institute for Brain Disorders, Beijing 100088, China
| |
Collapse
|
174
|
Logue MW, Amstadter AB, Baker DG, Duncan L, Koenen KC, Liberzon I, Miller MW, Morey RA, Nievergelt CM, Ressler KJ, Smith AK, Smoller JW, Stein MB, Sumner JA, Uddin M. The Psychiatric Genomics Consortium Posttraumatic Stress Disorder Workgroup: Posttraumatic Stress Disorder Enters the Age of Large-Scale Genomic Collaboration. Neuropsychopharmacology 2015; 40:2287-97. [PMID: 25904361 PMCID: PMC4538342 DOI: 10.1038/npp.2015.118] [Citation(s) in RCA: 105] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 03/10/2015] [Accepted: 03/25/2015] [Indexed: 11/09/2022]
Abstract
The development of posttraumatic stress disorder (PTSD) is influenced by genetic factors. Although there have been some replicated candidates, the identification of risk variants for PTSD has lagged behind genetic research of other psychiatric disorders such as schizophrenia, autism, and bipolar disorder. Psychiatric genetics has moved beyond examination of specific candidate genes in favor of the genome-wide association study (GWAS) strategy of very large numbers of samples, which allows for the discovery of previously unsuspected genes and molecular pathways. The successes of genetic studies of schizophrenia and bipolar disorder have been aided by the formation of a large-scale GWAS consortium: the Psychiatric Genomics Consortium (PGC). In contrast, only a handful of GWAS of PTSD have appeared in the literature to date. Here we describe the formation of a group dedicated to large-scale study of PTSD genetics: the PGC-PTSD. The PGC-PTSD faces challenges related to the contingency on trauma exposure and the large degree of ancestral genetic diversity within and across participating studies. Using the PGC analysis pipeline supplemented by analyses tailored to address these challenges, we anticipate that our first large-scale GWAS of PTSD will comprise over 10 000 cases and 30 000 trauma-exposed controls. Following in the footsteps of our PGC forerunners, this collaboration-of a scope that is unprecedented in the field of traumatic stress-will lead the search for replicable genetic associations and new insights into the biological underpinnings of PTSD.
Collapse
Affiliation(s)
- Mark W Logue
- Research, VA Boston Healthcare System, Boston, MA, USA
- Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA
- Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Ananda B Amstadter
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Dewleen G Baker
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, USA
- VA San Diego Healthcare System, VA Center of Excellence for Stress and Mental Health (CESAMH), La Jolla, CA, USA
| | - Laramie Duncan
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Karestan C Koenen
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Israel Liberzon
- Department of Psychiatry, University of Michigan, Ann Arbor, MI
- Veterans Affairs Ann Arbor Health System, Ann Arbor, MI, USA
| | - Mark W Miller
- National Center for PTSD, VA Boston Healthcare System, Boston, MA, USA
- Department of Psychiatry, Boston University School of Medicine, Boston, MA, USA
| | - Rajendra A Morey
- Duke-UNC Brain Imaging and Analysis Center, Duke University Medical Center, Durham, NC, USA
- Department of Psychiatry and Behavioral Sciences, Duke University Medical Center, Durham, NC, USA
- Mental Illness Research Education and Clinical Center for Post Deployment Mental Health, Durham VA Medical Center, Durham, NC, USA
| | - Caroline M Nievergelt
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, USA
- VA San Diego Healthcare System, VA Center of Excellence for Stress and Mental Health (CESAMH), La Jolla, CA, USA
| | - Kerry J Ressler
- Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, GA, USA
- Center for Behavioral Neuroscience, Yerkes National Primate Research Center, Atlanta, GA, USA
- Howard Hughes Medical Institute, Bethesda, MD, USA
| | - Alicia K Smith
- Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, GA, USA
| | - Jordan W Smoller
- Center of Human Genetics Research, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
| | - Murray B Stein
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, USA
| | - Jennifer A Sumner
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Monica Uddin
- Carl R Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, USA
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| |
Collapse
|
175
|
SHEsisPCA: a GPU-based software to correct for population stratification that efficiently accelerates the process for handling genome-wide datasets. J Genet Genomics 2015; 42:445-53. [PMID: 26336801 DOI: 10.1016/j.jgg.2015.06.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Revised: 05/30/2015] [Accepted: 06/10/2015] [Indexed: 11/24/2022]
Abstract
Population stratification is a problem in genetic association studies because it is likely to highlight loci that underlie the population structure rather than disease-related loci. At present, principal component analysis (PCA) has been proven to be an effective way to correct for population stratification. However, the conventional PCA algorithm is time-consuming when dealing with large datasets. We developed a Graphic processing unit (GPU)-based PCA software named SHEsisPCA (http://analysis.bio-x.cn/SHEsisMain.htm) that is highly parallel with a highest speedup greater than 100 compared with its CPU version. A cluster algorithm based on X-means was also implemented as a way to detect population subgroups and to obtain matched cases and controls in order to reduce the genomic inflation and increase the power. A study of both simulated and real datasets showed that SHEsisPCA ran at an extremely high speed while the accuracy was hardly reduced. Therefore, SHEsisPCA can help correct for population stratification much more efficiently than the conventional CPU-based algorithms.
Collapse
|
176
|
Genetic Geostatistical Framework for Spatial Analysis of Fine-Scale Genetic Heterogeneity in Modern Populations: Results from the KORA Study. Int J Genomics 2015; 2015:693193. [PMID: 26258132 PMCID: PMC4519554 DOI: 10.1155/2015/693193] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 04/06/2015] [Accepted: 04/07/2015] [Indexed: 12/04/2022] Open
Abstract
Aiming to investigate fine-scale patterns of genetic heterogeneity in modern humans from a geographic perspective, a genetic geostatistical approach framed within a geographic information system is presented. A sample collected for prospective studies in a small area of southern Germany was analyzed. None indication of genetic heterogeneity was detected in previous analysis. Socio-demographic and genotypic data of German citizens were analyzed (212 SNPs; n = 728). Genetic heterogeneity was evaluated with observed heterozygosity (HO). Best-fitting spatial autoregressive models were identified, using socio-demographic variables as covariates. Spatial analysis included surface interpolation and geostatistics of observed and predicted patterns. Prediction accuracy was quantified. Spatial autocorrelation was detected for both socio-demographic and genetic variables. Augsburg City and eastern suburban areas showed higher HO values. The selected model gave best predictions in suburban areas. Fine-scale patterns of genetic heterogeneity were observed. In accordance to literature, more urbanized areas showed higher levels of admixture. This approach showed efficacy for detecting and analyzing subtle patterns of genetic heterogeneity within small areas. It is scalable in number of loci, even up to whole-genome analysis. It may be suggested that this approach may be applicable to investigate the underlying genetic history that is, at least partially, embedded in geographic data.
Collapse
|
177
|
Wright NE, Strong JA, Gilbart ER, Shollenbarger SG, Lisdahl KM. 5-HTTLPR Genotype Moderates the Effects of Past Ecstasy Use on Verbal Memory Performance in Adolescent and Emerging Adults: A Pilot Study. PLoS One 2015; 10:e0134708. [PMID: 26231032 PMCID: PMC4521717 DOI: 10.1371/journal.pone.0134708] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Accepted: 07/13/2015] [Indexed: 11/18/2022] Open
Abstract
OBJECTIVE Ecstasy use is associated with memory deficits. Serotonin transporter gene (5-HTTLPR) polymorphisms have been linked with memory function in healthy samples. The present pilot study investigated the influence of 5-HTTLPR polymorphisms on memory performance in ecstasy users, marijuana-using controls, and non-drug-using controls, after a minimum of 7 days of abstinence. METHOD Data were collected from 116 young adults (18-25 years-old), including 45 controls, 42 marijuana users, and 29 ecstasy users, and were balanced for 5-HTTLPR genotype. Participants were abstinent seven days prior to completing memory testing. Three MANCOVAs and one ANCOVA were run to examine whether drug group, 5-HTTLPR genotype, and their interactions predicted verbal and visual memory after controlling for gender, past year alcohol use, other drug use, and nicotine cotinine levels. RESULTS MANCOVA and ANCOVA analysis revealed a significant interaction between drug group and genotype (p = .03) such that ecstasy users with the L/L genotype performed significantly worse on CVLT-2 total recall (p = .05), short (p = .008) and long delay free recall (p = .01), and recognition (p = .006), with the reverse pattern found in controls. Ecstasy did not significantly predict visual memory. 5-HTTLPR genotype significantly predicted memory for faces (p = .02); short allele carriers performed better than those with L/L genotype. CONCLUSIONS 5-HTTLPR genotype moderated the effects of ecstasy on verbal memory, with L/L carriers performing worse compared to controls. Future research should continue to examine individual differences in ecstasy's impact on neurocognitive performance as well as relationships with neuronal structure. Additional screening and prevention efforts focused on adolescents and emerging adults are necessary to prevent ecstasy consumption.
Collapse
Affiliation(s)
- Natasha E. Wright
- Department of Psychology, University of Wisconsin-Milwaukee, Milwaukee, WI, United States of America
| | - Judith A. Strong
- Department of Anesthesiology, University of Cincinnati, Cincinnati, OH, United States of America
| | - Erika R. Gilbart
- Department of Psychology, University of Wisconsin-Milwaukee, Milwaukee, WI, United States of America
| | - Skyler G. Shollenbarger
- Department of Psychology, University of Wisconsin-Milwaukee, Milwaukee, WI, United States of America
| | - Krista M. Lisdahl
- Department of Psychology, University of Wisconsin-Milwaukee, Milwaukee, WI, United States of America
| |
Collapse
|
178
|
Aloraifi F, McDevitt T, Martiniano R, McGreevy J, McLaughlin R, Egan CM, Cody N, Meany M, Kenny E, Green AJ, Bradley DG, Geraghty JG, Bracken AP. Detection of novel germline mutations for breast cancer in non-BRCA1/2 families. FEBS J 2015; 282:3424-37. [PMID: 26094658 DOI: 10.1111/febs.13352] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 06/12/2015] [Accepted: 06/17/2015] [Indexed: 01/22/2023]
Abstract
The identification of the breast cancer susceptibility genes BRCA1 and BRCA2 enhanced clinicians' ability to select high-risk individuals for aggressive surveillance and prevention, and led to the development of targeted therapies. However, BRCA1/2 mutations account for only 25% of familial breast cancer cases. To systematically identify rare, probably pathogenic variants in familial cases of breast cancer without BRCA1/2 mutations, we developed a list of 312 genes, and performed targeted DNA enrichment coupled to multiplex next-generation sequencing on 104 'BRCAx' patients and 101 geographically matched controls in Ireland. As expected, this strategy allowed us to identify mutations in several well-known high-susceptibility and moderate-susceptibility genes, including ATM (~ 5%), RAD50 (~ 3%), CHEK2 (~ 2%), TP53 (~ 1%), PALB2 (~ 1%), and MRE11A (~ 1%). However, we also identified novel pathogenic variants in 30 other genes, which, when taken together, potentially explain the etiology of the missing heritability in up to 35% of BRCAx patients. These included novel potential pathogenic mutations in MAP3K1, CASP8, RAD51B, ZNF217, CDKN2B-AS1, and ERBB2, including a splice site mutation, which we predict would generate a constitutively active HER2 protein. Taken together, this work extends our understanding of the genetics of familial breast cancer, and supports the need to implement hereditary multigene panel testing to more appropriately orientate clinical management.
Collapse
Affiliation(s)
- Fatima Aloraifi
- Smurfit Institute of Genetics, Trinity College Dublin, Ireland
| | - Trudi McDevitt
- National Centre for Medical Genetics, Our Lady's Hospital, Crumlin, Dublin 12, Ireland
| | - Rui Martiniano
- Smurfit Institute of Genetics, Trinity College Dublin, Ireland
| | - Jonah McGreevy
- Smurfit Institute of Genetics, Trinity College Dublin, Ireland
| | | | - Chris M Egan
- Smurfit Institute of Genetics, Trinity College Dublin, Ireland
| | - Nuala Cody
- National Centre for Medical Genetics, Our Lady's Hospital, Crumlin, Dublin 12, Ireland
| | - Marie Meany
- National Centre for Medical Genetics, Our Lady's Hospital, Crumlin, Dublin 12, Ireland
| | | | - Andrew J Green
- National Centre for Medical Genetics, Our Lady's Hospital, Crumlin, Dublin 12, Ireland
| | | | | | | |
Collapse
|
179
|
de Andrade M, Ray D, Pereira AC, Soler JP. Global Individual Ancestry Using Principal Components for Family Data. Hum Hered 2015; 80:1-11. [PMID: 26159893 DOI: 10.1159/000381908] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Studies of complex human diseases and traits associated with candidate genes are potentially vulnerable to bias (confounding) due to population stratification and inbreeding, especially in admixed population. In GWAS, the principal components (PCs) method provides a global ancestry value per subject, allowing corrections for population stratification. However, these coefficients are typically estimated assuming unrelated individuals, and if family structure is present and ignored, such substructures may induce artifactual PCs. Extensions of the PCs method have been proposed by Konishi and Rao [Biometrika 1992;79:631-641], taking into account only siblings' relatedness, and by Oualkacha et al. [Stat Appl Genet Mol Biol 2012, DOI: 10.2202/1544-6115.1711], taking into account large pedigrees and high-dimensional phenotype data. In this work, we extend these methods to estimate the global individual ancestry coefficients from PCs derived from different variance component matrix estimators using SNPs from two simulated data sets and two real data sets: the GENOA sibship data consisting of European and African-American subjects and the Baependi Heart Study consisting of 80 extended Brazilian families, both with genotyping data from the Affymetrix 6.0 chip. Our results show that the family structure plays an important role in the estimation of the global individual ancestry value for extended pedigrees but not for sibships.
Collapse
Affiliation(s)
- Mariza de Andrade
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minn., USA
| | | | | | | |
Collapse
|
180
|
Prokopenko D, Hecker J, Silverman E, Nöthen MM, Schmid M, Lange C, Loehlein Fier H. Using Network Methodology to Infer Population Substructure. PLoS One 2015; 10:e0130708. [PMID: 26098940 PMCID: PMC4476755 DOI: 10.1371/journal.pone.0130708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 05/22/2015] [Indexed: 11/24/2022] Open
Abstract
One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and describe the problem of population substructure from a graph-theoretical point of view. We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure. We then merge the triads into a network and apply community detection algorithms in order to identify homogeneous subgroups or communities, which can further be incorporated as covariates into logistic regression. We apply our method to populations from different continents in the 1000 Genomes Project and evaluate the type 1 error based on the empirical p-values. The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure. Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches.
Collapse
Affiliation(s)
- Dmitry Prokopenko
- Institute of Genomic Mathematics, University of Bonn, Bonn, Germany
- Institute of Human Genetics, University of Bonn, Bonn, Germany
- * E-mail:
| | - Julian Hecker
- Institute of Genomic Mathematics, University of Bonn, Bonn, Germany
- Institute of Human Genetics, University of Bonn, Bonn, Germany
| | - Edwin Silverman
- Channing Laboratory, Brigham and Women's Hospital, Boston, United States of America
| | | | - Matthias Schmid
- Institute of Medical Biometrics, Informatics and Epidemiology, University of Bonn, Bonn, Germany
| | - Christoph Lange
- Institute of Genomic Mathematics, University of Bonn, Bonn, Germany
- Department of Biostatistics, Harvard School of Public Health, Boston, United States of America
- Channing Laboratory, Brigham and Women's Hospital, Boston, United States of America
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| | - Heide Loehlein Fier
- Institute of Genomic Mathematics, University of Bonn, Bonn, Germany
- Institute of Human Genetics, University of Bonn, Bonn, Germany
| |
Collapse
|
181
|
Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure. Proc Natl Acad Sci U S A 2015; 112:E3441-50. [PMID: 26071445 DOI: 10.1073/pnas.1412301112] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Admixture models are a ubiquitous approach to capture latent population structure in genetic samples. Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (PPCs), a Bayesian method for assessing the quality of fit of a statistical model to a specific dataset. We develop PPCs for five population-level statistics of interest: within-population genetic variation, background linkage disequilibrium, number of ancestral populations, between-population genetic variation, and the downstream use of admixture parameters to correct for population structure in association studies. Using PPCs, we evaluate the quality of the admixture model fit to four qualitatively different population genetic datasets: the population reference sample (POPRES) European individuals, the HapMap phase 3 individuals, continental Indians, and African American individuals. We found that the same model fitted to different genomic studies resulted in highly study-specific results when evaluated using PPCs, illustrating the utility of PPCs for model-based analyses in large genomic studies.
Collapse
|
182
|
Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation. Am J Hum Genet 2015; 96:926-37. [PMID: 26027497 DOI: 10.1016/j.ajhg.2015.04.018] [Citation(s) in RCA: 102] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2014] [Accepted: 04/29/2015] [Indexed: 11/20/2022] Open
Abstract
Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.
Collapse
|
183
|
Sha'ari HM, Haerian BS, Baum L, Saruwatari J, Tan HJ, Rafia MH, Raymond AA, Kwan P, Ishitsu T, Nakagawa K, Lim KS, Mohamed Z. ABCC2 rs2273697 and rs3740066 polymorphisms and resistance to antiepileptic drugs in Asia Pacific epilepsy cohorts. Pharmacogenomics 2015; 15:459-66. [PMID: 24624913 DOI: 10.2217/pgs.13.239] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
AIM To examine the relevance of ABCC2 polymorphisms to drug responsiveness in epilepsy cohorts from the Asia Pacific region. MATERIALS & METHODS The rs2273697 and rs3740066 polymorphisms were genotyped in 2056 Malaysian (55%), Hong Kong (32%) and Japanese (13%) epilepsy patients. RESULTS Significant allele association of rs2273697 was observed in Chinese females with epilepsy, Malaysian Chinese patients with generalized seizure and Japanese patients with partial seizure for the AA versus GG genotype model and Malaysian Chinese patients with generalized seizure for the GA versus GG and autosomal dominant models. Significant association of the rs3740066 allele was observed in Malaysian females of Malay origin with cryptogenic epilepsy and Chinese patients with partial seizure and for genotypes in Malay patients with cryptogenic epilepsy for the CT versus CC and autosomal dominant genotype models. Significant results were observed for all haplotypes, but following Bonferroni correction, only the GT haplotype in Chinese patients remained significant. CONCLUSION This study suggests that the GT haplotype might be a risk factor for resistance to medication in Chinese patients.
Collapse
Affiliation(s)
- Hidayati Mohd Sha'ari
- Pharmacogenomics Laboratory, Department of Pharmacology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
184
|
Impact of Population Stratification on Family-Based Association in an Admixed Population. Int J Genomics 2015; 2015:501617. [PMID: 26064873 PMCID: PMC4434195 DOI: 10.1155/2015/501617] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 04/07/2015] [Indexed: 12/11/2022] Open
Abstract
Population substructure is a well-known confounder in population-based case-control genetic studies, but its impact in family-based studies is unclear. We performed population substructure analysis using extended families of admixed population to evaluate power and Type I error in an association study framework. Our analysis shows that power was improved by 1.5% after principal components adjustment. Type I error was also reduced by 2.2% after adjusting for family substratification. The presence of population substructure was underscored by discriminant analysis, in which over 92% of individuals were correctly assigned to their actual family using only 100 principal components. This study demonstrates the importance of adjusting for population substructure in family-based studies of admixed populations.
Collapse
|
185
|
Leslie S, Winney B, Hellenthal G, Davison D, Boumertit A, Day T, Hutnik K, Royrvik EC, Cunliffe B, Lawson DJ, Falush D, Freeman C, Pirinen M, Myers S, Robinson M, Donnelly P, Bodmer W. The fine-scale genetic structure of the British population. Nature 2015; 519:309-314. [PMID: 25788095 PMCID: PMC4632200 DOI: 10.1038/nature14230] [Citation(s) in RCA: 232] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2013] [Accepted: 01/13/2015] [Indexed: 12/22/2022]
Abstract
Fine-scale genetic variation between human populations is interesting as a signature of historical demographic events and because of its potential for confounding disease studies. We use haplotype-based statistical methods to analyse genome-wide single nucleotide polymorphism (SNP) data from a carefully chosen geographically diverse sample of 2,039 individuals from the United Kingdom. This reveals a rich and detailed pattern of genetic differentiation with remarkable concordance between genetic clusters and geography. The regional genetic differentiation and differing patterns of shared ancestry with 6,209 individuals from across Europe carry clear signals of historical demographic events. We estimate the genetic contribution to southeastern England from Anglo-Saxon migrations to be under half, and identify the regions not carrying genetic material from these migrations. We suggest significant pre-Roman but post-Mesolithic movement into southeastern England from continental Europe, and show that in non-Saxon parts of the United Kingdom, there exist genetically differentiated subgroups rather than a general 'Celtic' population.
Collapse
Affiliation(s)
- Stephen Leslie
- Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria 3052, Australia
- University of Melbourne, Department of Mathematics and Statistics, Parkville, Victoria 3010, Australia
| | - Bruce Winney
- University of Oxford, Department of Oncology, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, UK
| | - Garrett Hellenthal
- University College London Genetics Institute, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Dan Davison
- Counsyl, Inc. 180 Kimball Way, South San Francisco, CA 94080, USA
| | - Abdelhamid Boumertit
- University of Oxford, Department of Oncology, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, UK
| | - Tammy Day
- University of Oxford, Department of Oncology, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, UK
| | - Katarzyna Hutnik
- University of Oxford, Department of Oncology, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, UK
| | - Ellen C Royrvik
- University of Oxford, Department of Oncology, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, UK
| | - Barry Cunliffe
- University of Oxford, Institute of Archaeology, 36 Beaumont Street, Oxford, OX1 2PG, UK
| | - Daniel J Lawson
- University of Bristol, Department of Mathematics, University Walk, Bristol, BS8 1TW, UK
| | - Daniel Falush
- Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany
| | - Colin Freeman
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Matti Pirinen
- University of Helsinki, P.O. Box 20, Helsinki, FI-00014, Finland
| | - Simon Myers
- University of Oxford, Department of Statistics, 1 South Parks Road, Oxford, OX1 3TG, UK
| | - Mark Robinson
- University of Oxford, University Museum of Natural History, Parks Road, Oxford, OX1 3PW, UK
| | - Peter Donnelly
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- University of Oxford, Department of Statistics, 1 South Parks Road, Oxford, OX1 3TG, UK
| | - Walter Bodmer
- University of Oxford, Department of Oncology, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, UK
| |
Collapse
|
186
|
Strike LT, Couvy-Duchesne B, Hansell NK, Cuellar-Partida G, Medland SE, Wright MJ. Genetics and Brain Morphology. Neuropsychol Rev 2015; 25:63-96. [DOI: 10.1007/s11065-015-9281-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 02/08/2015] [Indexed: 12/17/2022]
|
187
|
A powerful nonparametric statistical framework for family-based association analyses. Genetics 2015; 200:69-78. [PMID: 25745024 DOI: 10.1534/genetics.115.175174] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 02/23/2015] [Indexed: 01/04/2023] Open
Abstract
Family-based study design is commonly used in genetic research. It has many ideal features, including being robust to population stratification (PS). With the advance of high-throughput technologies and ever-decreasing genotyping cost, it has become common for family studies to examine a large number of variants for their associations with disease phenotypes. The yield from the analysis of these family-based genetic data can be enhanced by adopting computationally efficient and powerful statistical methods. We propose a general framework of a family-based U-statistic, referred to as family-U, for family-based association studies. Unlike existing parametric-based methods, the proposed method makes no assumption of the underlying disease models and can be applied to various phenotypes (e.g., binary and quantitative phenotypes) and pedigree structures (e.g., nuclear families and extended pedigrees). By using only within-family information, it can offer robust protection against PS. In the absence of PS, it can also utilize additional information (i.e., between-family information) for power improvement. Through simulations, we demonstrated that family-U attained higher power over a commonly used method, family-based association tests, under various disease scenarios. We further illustrated the new method with an application to large-scale family data from the Framingham Heart Study. By utilizing additional information (i.e., between-family information), family-U confirmed a previous association of CHRNA5 with nicotine dependence.
Collapse
|
188
|
Pirie A, Wood A, Lush M, Tyrer J, Pharoah PDP. The effect of rare variants on inflation of the test statistics in case-control analyses. BMC Bioinformatics 2015; 16:53. [PMID: 25888290 PMCID: PMC4339749 DOI: 10.1186/s12859-015-0496-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Accepted: 02/12/2015] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test statistic. This ratio is inflated in the presence of cryptic population structure. However, inflation may also be caused by the properties of the association test itself particularly in the analysis of rare variants. We compared the properties of the three most commonly used association tests: the likelihood ratio test, the Wald test and the score test when testing rare variants for association using simulated data. RESULTS We found evidence of inflation in the median test statistics of the likelihood ratio and score tests for tests of variants with less than 20 heterozygotes across the sample, regardless of the total sample size. The test statistics for the Wald test were under-inflated at the median for variants below the same minor allele frequency. CONCLUSIONS In a genetic association study, if a substantial proportion of the genetic variants tested have rare minor allele frequencies, the properties of the association test may mask the presence or absence of bias due to population structure. The use of either the likelihood ratio test or the score test is likely to lead to inflation in the median test statistic in the absence of population structure. In contrast, the use of the Wald test is likely to result in under-inflation of the median test statistic which may mask the presence of population structure.
Collapse
Affiliation(s)
- Ailith Pirie
- Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
| | - Angela Wood
- Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
| | - Michael Lush
- Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
| | - Jonathan Tyrer
- Department of Oncology, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
| | - Paul D P Pharoah
- Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
- Department of Oncology, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
| |
Collapse
|
189
|
Herman AI, DeVito EE, Jensen KP, Sofuoglu M. Pharmacogenetics of nicotine addiction: role of dopamine. Pharmacogenomics 2015; 15:221-34. [PMID: 24444411 DOI: 10.2217/pgs.13.246] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The neurotransmitter dopamine (DA) plays a central role in addictive disorders, including nicotine addiction. Specific DA-related gene variants have been studied to identify responsiveness to treatment for nicotine addiction. Genetic variants in DRD2, DRD4, ANKK1, DAT1, COMT and DBH genes show some promise in informing personalized prescribing of smoking cessation pharmacotherapies. However, many trials studying these variants had small samples, used retrospective design or were composed of mainly self-identified Caucasian individuals. Furthermore, many of these studies lacked a comprehensive measurement of nicotine metabolism rate, did not assess the roles of sex or the menstrual cycle, and did not investigate the role of rare variants and/or epigenetic factors. Future work should be conducted addressing these limitations to more effectively utilize DA genetic information to unlock the potential of smoking cessation pharmacogenetics.
Collapse
Affiliation(s)
- Aryeh I Herman
- Yale University, School of Medicine, Department of Psychiatry & VA Connecticut Healthcare System, VA Medical Center, 950 Campbell Avenue, West Haven, CT 06516, USA
| | | | | | | |
Collapse
|
190
|
Shin J, Lee C. A mixed model reduces spurious genetic associations produced by population stratification in genome-wide association studies. Genomics 2015; 105:191-6. [PMID: 25640449 DOI: 10.1016/j.ygeno.2015.01.006] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Revised: 01/21/2015] [Accepted: 01/23/2015] [Indexed: 01/06/2023]
Abstract
Population stratification can produce spurious genetic associations in genome-wide association studies (GWASs). Mixed model methodology has been regarded useful for correcting population stratification. This study explored statistical power and false discovery rate (FDR) with the data simulated for dichotomous traits. Empirical FDRs and powers were estimated using fixed models with and without genomic control and using mixed models with and without reflecting loci linked to the candidate marker in genetic relationships. Population stratification with admixture degree ranged from 1% to 10% resulted in inflated FDRs from the fixed model analysis without genomic control and decreased power from the fixed model analysis with genomic control (P<0.05). Meanwhile, population stratification could not change FDR and power estimates from the mixed model analyses (P>0.05). We suggest that the mixed model methodology was useful to reduce spurious genetic associations produced by population stratification in GWAS, even with a high degree of admixture (10%).
Collapse
Affiliation(s)
- Jimin Shin
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 156-743, Republic of Korea
| | - Chaeyoung Lee
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 156-743, Republic of Korea.
| |
Collapse
|
191
|
Bansal V, Libiger O. Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations. BMC Bioinformatics 2015; 16:4. [PMID: 25592880 PMCID: PMC4301802 DOI: 10.1186/s12859-014-0418-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 12/10/2014] [Indexed: 01/18/2023] Open
Abstract
Background Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads. Results We describe a fast method for estimating the relative contribution of known reference populations to an individual’s genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling. Conclusions Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0418-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Vikas Bansal
- Department of Pediatrics, University of California San Diego, 9500 Gilman Drive, La Jolla, 92093, CA, USA. .,Scripps Translational Science Institute, 3344 N Torrey Pines Court, La Jolla, 92037, CA, USA.
| | - Ondrej Libiger
- Scripps Translational Science Institute, 3344 N Torrey Pines Court, La Jolla, 92037, CA, USA. .,Current address: MD Revolution, San Diego, CA, USA.
| |
Collapse
|
192
|
Bhaskar A, Wang YXR, Song YS. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res 2015; 25:268-79. [PMID: 25564017 PMCID: PMC4315300 DOI: 10.1101/gr.178756.114] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions.
Collapse
Affiliation(s)
- Anand Bhaskar
- Simons Institute for the Theory of Computing, Berkeley, California 94720, USA; Computer Science Division, University of California, Berkeley, California 94720, USA
| | - Y X Rachel Wang
- Department of Statistics, University of California, Berkeley, California 94720, USA
| | - Yun S Song
- Simons Institute for the Theory of Computing, Berkeley, California 94720, USA; Computer Science Division, University of California, Berkeley, California 94720, USA; Department of Statistics, University of California, Berkeley, California 94720, USA; Department of Integrative Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
193
|
Isidro J, Jannink JL, Akdemir D, Poland J, Heslot N, Sorrells ME. Training set optimization under population structure in genomic selection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015; 128:145-58. [PMID: 25367380 PMCID: PMC4282691 DOI: 10.1007/s00122-014-2418-4] [Citation(s) in RCA: 164] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Accepted: 10/12/2014] [Indexed: 05/17/2023]
Abstract
Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.
Collapse
|
194
|
Qian L, Qian W, Snowdon RJ. Sub-genomic selection patterns as a signature of breeding in the allopolyploid Brassica napus genome. BMC Genomics 2014; 15:1170. [PMID: 25539568 PMCID: PMC4367848 DOI: 10.1186/1471-2164-15-1170] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Accepted: 12/18/2014] [Indexed: 02/07/2023] Open
Abstract
Background High-density single-nucleotide polymorphism (SNP) genotyping arrays are a powerful tool for genome-wide association studies and can give valuable insight into patterns of population structure and linkage disequilibrium (LD). In this study we used the Brassica 60kSNP Illumina consortium genotyping array to assess the influence of selection and breeding for important agronomic traits on LD and haplotype structure in a diverse panel of 203 Chinese semi-winter rapeseed (Brassica napus) breeding lines. Results Population structure and principal coordinate analysis, using a subset of the SNPs, revealed diversification into three subpopulations and one mixed population, reflecting targeted introgressions from external gene pools during breeding. Pairwise LD analysis within the A- and C-subgenomes of allopolyploid B. napus revealed that mean LD, at a threshold of r2 = 0.1, decayed on average around ten times more rapidly in the A-subgenome (0.25-0.30 Mb) than in the C-subgenome (2.00-2.50 Mb). A total of 3,097 conserved haplotype blocks were detected over a total length of 182.49 Mb (15.17% of the genome). The mean size of haplotype blocks was considerably longer in the C-subgenome (102.85 Kb) than in the A-subgenome (33.51 Kb), and extremely large conserved haplotype blocks were found on a number of C-genome chromosomes. Comparative sequence analysis revealed conserved blocks containing homoloeogous quantitative trait loci (QTL) for seed erucic acid and glucosinolate content, two key seed quality traits under strong agronomic selection. Interestingly, C-subgenome QTL were associated with considerably greater conservation of LD than their corresponding A-subgenome homoeologues. Conclusions The data we present in this paper provide evidence for strong selection of large chromosome regions associated with important rapeseed seed quality traits conferred by C-subgenome QTL. This implies that an increase in genetic diversity and recombination within the C-genome is particularly important for breeding. The resolution of genome-wide association studies is also expected to vary greatly across different genome regions. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-1170) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Rod J Snowdon
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich-Buff-Ring 26-32, 35392 Giessen, Germany.
| |
Collapse
|
195
|
Hulsman Hanna LL, Garrick DJ, Gill CA, Herring AD, Sanders JO, Riley DG. Comparison of breeding value prediction for two traits in a Nellore-Angus crossbred population using different Bayesian modeling methodologies. Genet Mol Biol 2014; 37:631-7. [PMID: 25505837 PMCID: PMC4261962 DOI: 10.1590/s1415-47572014005000021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 07/23/2014] [Indexed: 12/29/2022] Open
Abstract
The objectives of this study were to 1) compare four models for breeding value prediction using genomic or pedigree information and 2) evaluate the impact of fixed effects that account for family structure. Comparisons were made in a Nellore-Angus population comprising F2, F3 and half-siblings to embryo transfer F2 calves with records for overall temperament at weaning (TEMP; n = 769) and Warner-Bratzler shear force (WBSF; n = 387). After quality control, there were 34,913 whole genome SNP markers remaining. Bayesian methods employed were BayesB (π̃ = 0.995 or 0.997 for WBSF or TEMP, respectively) and BayesC (π = 0 and π̃), where π̃ is the ideal proportion of markers not included. Direct genomic values (DGV) from single trait Bayesian analyses were compared to conventional pedigree-based animal model breeding values. Numerically, BayesC procedures (using π̃) had the highest accuracy of all models for WBSF and TEMP (ρ̂gĝ = 0.843 and 0.923, respectively), but BayesB had the least bias (regression of performance on prediction closest to 1, β̂y,x = 2.886 and 1.755, respectively). Accounting for family structure decreased accuracy and increased bias in prediction of DGV indicating a detrimental impact when used in these prediction methods that simultaneously fit many markers.
Collapse
Affiliation(s)
| | - Dorian J Garrick
- Department of Animal Science , Iowa State University , Ames, Iowa , USA
| | - Clare A Gill
- Department of Animal Science , Texas A&M University , College Station, Texas , USA
| | - Andy D Herring
- Department of Animal Science , Texas A&M University , College Station, Texas , USA
| | - James O Sanders
- Department of Animal Science , Texas A&M University , College Station, Texas , USA
| | - David G Riley
- Department of Animal Science , Texas A&M University , College Station, Texas , USA
| |
Collapse
|
196
|
Velázquez-Cruz R, Jiménez-Ortega RF, Parra-Torres AY, Castillejos-López M, Patiño N, Quiterio M, Villarreal-Molina T, Salmerón J. Analysis of association of MEF2C, SOST and JAG1 genes with bone mineral density in Mexican-Mestizo postmenopausal women. BMC Musculoskelet Disord 2014; 15:400. [PMID: 25430630 PMCID: PMC4258010 DOI: 10.1186/1471-2474-15-400] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 11/24/2014] [Indexed: 12/18/2022] Open
Abstract
Background Osteoporosis, a disease characterized by low bone mineral density (BMD), is an important health problem in Mexico. BMD is a highly heritable trait, with heritability estimates of 50-85%. Several candidate genes have been evaluated to identify those involved in BMD variation and the etiology of osteoporosis. This study investigated the possible association of single-nucleotide polymorphisms (SNPs) in the MEF2C, SOST and JAG1 genes with bone mineral density (BMD) variation in postmenopausal Mexican-Mestizo women. Methods Four hundred unrelated postmenopausal women were included in the study. Risk factors were recorded and BMD was measured in total hip, femoral neck and lumbar spine using dual-energy X-ray absorptiometry. In an initial stage, a total of twenty-five SNPs within or near SOST gene and seven SNPs in the JAG1 gene were genotyped using a GoldenGate assay. In a second stage, three MEF2C gene SNPs were also genotyped and SOST and JAG1 gene variants were validated. Real time PCR and TaqMan probes were used for genotyping. Results Linear regression analyses adjusted by age, body mass index and ancestry estimates, showed that five SNPs in the SOST gene were significantly associated with BMD in total hip and femoral neck but not lumbar spine. The lowest p value was 0.0012, well below the multiple–test significance threshold (p = 0.009), with mean effect size of -0.027 SD per risk allele. We did not find significant associations between BMD and MEF2C/JAG1 gene variants [rs1366594 “A” allele: β = 0.001 (95% CI -0.016; 0.017), P = 0.938; rs2273061 “G” allele: β = 0.007 (95% CI -0.007; 0.023), p = 0.409]. Conclusions SOST polymorphisms may contribute to total hip and femoral neck BMD variation in Mexican postmenopausal women. Together, these and prior findings suggest that this gene may contribute to BMD variation across populations of diverse ancestry. Electronic supplementary material The online version of this article (doi:10.1186/1471-2474-15-400) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rafael Velázquez-Cruz
- Laboratorio de Genómica del Metabolismo Óseo, Instituto Nacional de Medicina Genómica, Mexico City, Mexico.
| | | | | | | | | | | | | | | |
Collapse
|
197
|
Wang KS, Tonarelli S, Luo X, Wang L, Su B, Zuo L, Mao C, Rubin L, Briones D, Xu C. Polymorphisms within ASTN2 gene are associated with age at onset of Alzheimer's disease. J Neural Transm (Vienna) 2014; 122:701-8. [PMID: 25410587 DOI: 10.1007/s00702-014-1306-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 08/27/2014] [Indexed: 12/25/2022]
Abstract
Alzheimer's disease (AD) is a multifactorial neurological condition associated with genetic profiles that are still not completely understood. We performed a family-based low-density genome-wide association analysis of age at onset (AAO) in AD (244 patients and their relatives) using Illumina 6 K single-nucleotide polymorphisms (SNPs) panel and the FBAT-logrank statistic. We observed 10 SNPs associated with AAO in AD with p < 2 × 10(-3). The most significant hit within a known gene, the neuronal protein astrotactin 2 (ASTN2), was SNP rs1334071 (p = 8.74 × 10(-4)). ASTN2 has been implicated in several neuropsychiatric disorders, including cognitive disorders, autism and schizophrenia. We then conducted a replication study focusing on ASTN2 gene in a Canadian sample of 791 AD patients and 782 controls using the logrank test. Five ASTN2 SNPs (highest association is rs16933774 with p = 0.0053) showed associations with AAO in this Canadian sample (p < 0.05). Furthermore, Kaplan-Meier survival analysis of SNP rs16933774 showed that the AAO of AD in individuals heterozygous for AG genotype of rs16933774 (median of AAO = 68.5 years) was approximately 4.5 years earlier than those individuals having the AA genotype (median of AAO = 73 years). In conclusion, a significant association of ASTN2 genetic variants with AAO of AD in two independent samples demonstrates a role for ASTN2 in the pathogenesis of AD. Future functional studies of this gene may help to characterize the genetic architecture of the AAO of AD. Genetic factors in AAO may be a critical factor for early AD intervention and prevention efforts.
Collapse
Affiliation(s)
- Ke-Sheng Wang
- Department of Biostatistics and Epidemiology, College of Public Health, East Tennessee State University, Johnson, TN, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
198
|
Suthandiram S, Gan GG, Mohd Zain S, Bee PC, Lian LH, Chang KM, Ong TC, Mohamed Z. Genetic polymorphisms in the one-carbon metabolism pathway genes and susceptibility to non-Hodgkin lymphoma. Tumour Biol 2014; 36:1819-34. [PMID: 25384508 DOI: 10.1007/s13277-014-2785-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Accepted: 10/29/2014] [Indexed: 12/19/2022] Open
Abstract
Corroborating evidence related to the role of aberrations on one-carbon metabolism (OCM) genes has been inconsistent. We evaluated the association between polymorphisms in 12 single nucleotide polymorphisms (SNPs) in 8 OCM genes (CBS, FPGS, FTHFD, MTRR, SHMT1, SLC19A1, TCN1, and TYMS), and non-Hodgkin lymphoma (NHL) risk in a multi-ethnic population which includes Malay, Chinese and Indian ethnic subgroups. Cases (N = 372) and controls (N = 722) were genotyped using the Sequenom MassARRAY platform. Our results of the pooled subjects showed a significantly enhanced NHL risk for CBS Ex9 + 33C > T (T versus C: OR 1.55, 95% CI 1.22-1.96, P = 0.0003), CBS Ex18-319G > A (A versus G: OR 1.15, 95% CI 1.14-1.83; P = 0.002), SHMT1 Ex12 + 236 T > C (T versus C: OR 1.44, 95% CI 1.15-1.81, P = 0.002), and TYMS Ex8 + 157C > T (T versus C: OR 1.29, 95% CI 1.06-1.57, P = 0.01). Haplotype analysis for CBS SNPs showed a significantly decreased risk of NHL in subjects with haplotype CG (OR 0.69, 95% CI 0.56-0.86, P = <0.001). The GG haplotype for the FTHFD SNPs showed a significant increased risk of NHL (OR 1.40, 95% CI 1.12-1.76, P = 0.002). For the TYMS gene, haplotype CAT at TYMS (OR 0.67, 95% CI 0.49-0.90, P = 0.007) was associated with decreased risk of NHL, while haplotype TAC (OR 1.29, 95% CI 1.05-1.58, P = 0.01) was found to confer increased risk of NHL. Our study suggests that variation in several OCM genes (CBS, FTHFD, SHMT1, TCN1, and TYMS) may influence susceptibility to NHL.
Collapse
Affiliation(s)
- Sujatha Suthandiram
- The Pharmacogenomics Laboratory, Department of Pharmacology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia,
| | | | | | | | | | | | | | | |
Collapse
|
199
|
Results of a "GWAS plus:" general cognitive ability is substantially heritable and massively polygenic. PLoS One 2014; 9:e112390. [PMID: 25383866 PMCID: PMC4226546 DOI: 10.1371/journal.pone.0112390] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 05/04/2014] [Indexed: 11/24/2022] Open
Abstract
We carried out a genome-wide association study (GWAS) for general cognitive ability (GCA) plus three other analyses of GWAS data that aggregate the effects of multiple single-nucleotide polymorphisms (SNPs) in various ways. Our multigenerational sample comprised 7,100 Caucasian participants, drawn from two longitudinal family studies, who had been assessed with an age-appropriate IQ test and had provided DNA samples passing quality screens. We conducted the GWAS across ∼2.5 million SNPs (both typed and imputed), using a generalized least-squares method appropriate for the different family structures present in our sample, and subsequently conducted gene-based association tests. We also conducted polygenic prediction analyses under five-fold cross-validation, using two different schemes of weighting SNPs. Using parametric bootstrapping, we assessed the performance of this prediction procedure under the null. Finally, we estimated the proportion of variance attributable to all genotyped SNPs as random effects with software GCTA. The study is limited chiefly by its power to detect realistic single-SNP or single-gene effects, none of which reached genome-wide significance, though some genomic inflation was evident from the GWAS. Unit SNP weights performed about as well as least-squares regression weights under cross-validation, but the performance of both increased as more SNPs were included in calculating the polygenic score. Estimates from GCTA were 35% of phenotypic variance at the recommended biological-relatedness ceiling. Taken together, our results concur with other recent studies: they support a substantial heritability of GCA, arising from a very large number of causal SNPs, each of very small effect. We place our study in the context of the literature–both contemporary and historical–and provide accessible explication of our statistical methods.
Collapse
|
200
|
Fodor A, Segura V, Denis M, Neuenschwander S, Fournier-Level A, Chatelet P, Homa FAA, Lacombe T, This P, Le Cunff L. Genome-wide prediction methods in highly diverse and heterozygous species: proof-of-concept through simulation in grapevine. PLoS One 2014; 9:e110436. [PMID: 25365338 PMCID: PMC4217727 DOI: 10.1371/journal.pone.0110436] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 09/19/2014] [Indexed: 11/20/2022] Open
Abstract
Nowadays, genome-wide association studies (GWAS) and genomic selection (GS) methods which use genome-wide marker data for phenotype prediction are of much potential interest in plant breeding. However, to our knowledge, no studies have been performed yet on the predictive ability of these methods for structured traits when using training populations with high levels of genetic diversity. Such an example of a highly heterozygous, perennial species is grapevine. The present study compares the accuracy of models based on GWAS or GS alone, or in combination, for predicting simple or complex traits, linked or not with population structure. In order to explore the relevance of these methods in this context, we performed simulations using approx 90,000 SNPs on a population of 3,000 individuals structured into three groups and corresponding to published diversity grapevine data. To estimate the parameters of the prediction models, we defined four training populations of 1,000 individuals, corresponding to these three groups and a core collection. Finally, to estimate the accuracy of the models, we also simulated four breeding populations of 200 individuals. Although prediction accuracy was low when breeding populations were too distant from the training populations, high accuracy levels were obtained using the sole core-collection as training population. The highest prediction accuracy was obtained (up to 0.9) using the combined GWAS-GS model. We thus recommend using the combined prediction model and a core-collection as training population for grapevine breeding or for other important economic crops with the same characteristics.
Collapse
Affiliation(s)
- Agota Fodor
- UMT Geno-Vigne, IFV-INRA-Montpellier Supagro, Montpellier, France; UMR AGAP, INRA, Montpellier, France
| | | | | | - Samuel Neuenschwander
- University of Lausanne, Department of Ecology and Evolution, Lausanne, Switzerland; University of Lausanne, Swiss Institute of Bioinformatics, Vital-IT, Lausanne, Switzerland
| | | | | | | | | | - Patrice This
- UMT Geno-Vigne, IFV-INRA-Montpellier Supagro, Montpellier, France; UMR AGAP, INRA, Montpellier, France
| | - Loic Le Cunff
- UMT Geno-Vigne, IFV-INRA-Montpellier Supagro, Montpellier, France; UMR AGAP, INRA, Montpellier, France
| |
Collapse
|