1
|
Lee D, Kim Y, Chung Y, Lee D, Seo D, Choi TJ, Lim D, Yoon D, Lee SH. Accuracy of genotype imputation based on reference population size and marker density in Hanwoo cattle. JOURNAL OF ANIMAL SCIENCE AND TECHNOLOGY 2021; 63:1232-1246. [PMID: 34957440 PMCID: PMC8672260 DOI: 10.5187/jast.2021.e117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/13/2021] [Accepted: 10/14/2021] [Indexed: 11/20/2022]
Abstract
Recently, the cattle genome sequence has been completed, followed by developing a
commercial single nucleotide polymorphism (SNP) chip panel in the animal genome
industry. In order to increase statistical power for detecting quantitative
trait locus (QTL), a number of animals should be genotyped. However, a
high-density chip for many animals would be increasing the genotyping cost.
Therefore, statistical inference of genotype imputation (low-density chip to
high-density) will be useful in the animal industry. The purpose of this study
is to investigate the effect of the reference population size and marker density
on the imputation accuracy and to suggest the appropriate number of reference
population sets for the imputation in Hanwoo cattle. A total of 3,821 Hanwoo
cattle were divided into reference and validation populations. The reference
sets consisted of 50k (38,916) marker data and different population sizes (500,
1,000, 1,500, 2,000, and 3,600). The validation sets consisted of four
validation sets (Total 889) and the different marker density (5k [5,000], 10k
[10,000], and 15k [15,000]). The accuracy of imputation was calculated by direct
comparison of the true genotype and the imputed genotype. In conclusion, when
the lowest marker density (5k) was used in the validation set, according to the
reference population size, the imputation accuracy was 0.793 to 0.929. On the
other hand, when the highest marker density (15k), according to the reference
population size, the imputation accuracy was 0.904 to 0.967. Moreover, the
reference population size should be more than 1,000 to obtain at least 88%
imputation accuracy in Hanwoo cattle.
Collapse
Affiliation(s)
- DooHo Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Yeongkuk Kim
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Yoonji Chung
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Dongjae Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Dongwon Seo
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Tae Jeong Choi
- National Institute of Animal Science, Cheonan 31000, Korea
| | - Dajeong Lim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Wanju 55365, Korea
| | - Duhak Yoon
- Department of Animal Science & Biotechnology, Kyungpook National University, Sangju 37224, Korea
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| |
Collapse
|
2
|
Comparison of Selection Signatures between Korean Native and Commercial Chickens Using 600K SNP Array Data. Genes (Basel) 2021; 12:genes12060824. [PMID: 34072132 PMCID: PMC8230197 DOI: 10.3390/genes12060824] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 05/18/2021] [Accepted: 05/24/2021] [Indexed: 12/14/2022] Open
Abstract
Korean native chickens (KNCs) comprise an indigenous chicken breed of South Korea that was restored through a government project in the 1990s. The KNC population has not been developed well and has mostly been used to maintain purebred populations in the government research institution. We investigated the genetic features of the KNC population in a selection signal study for the efficient improvement of this breed. We used 600K single nucleotide polymorphism data sampled from 191 KNCs (NG, 38; NL, 29; NR, 52; NW, 39; and NY, 33) and 54 commercial chickens (Hy-line Brown, 10; Lohmann Brown, 10; Arbor Acres, 10; Cobb, 12; and Ross, 12). Haplotype phasing was performed using EAGLE software as the initial step for the primary data analysis. Pre-processed data were analyzed to detect selection signals using the ‘rehh’ package in R software. A few common signatures of selection were identified in KNCs. Most quantitative trait locus regions identified as candidate regions were associated with traits related to reproductive organs, eggshell characteristics, immunity, and organ development. Block patterns with high linkage disequilibrium values were observed for LPP, IGF11, LMNB2, ERBB4, GABRB2, NTM, APOO, PLOA1, CNTN1, NTSR1, DEF3, CELF1, and MEF2D genes, among regions with confirmed selection signals. NL and NW lines contained a considerable number of selective sweep regions related to broilers and layers, respectively. We recommend focusing on improving the egg and meat traits of KNC NL and NW lines, respectively, while improving multiple traits for the other lines.
Collapse
|
3
|
Eydivandi S, Roudbar MA, Karimi MO, Sahana G. Genomic scans for selective sweeps through haplotype homozygosity and allelic fixation in 14 indigenous sheep breeds from Middle East and South Asia. Sci Rep 2021; 11:2834. [PMID: 33531649 PMCID: PMC7854752 DOI: 10.1038/s41598-021-82625-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 01/22/2021] [Indexed: 01/30/2023] Open
Abstract
The performance and productivity of livestock have consistently improved by natural and artificial selection over the centuries. Both these selections are expected to leave patterns on the genome and lead to changes in allele frequencies, but natural selection has played the major role among indigenous populations. Detecting selective sweeps in livestock may assist in understanding the processes involved in domestication, genome evolution and discovery of genomic regions associated with economically important traits. We investigated population genetic diversity and selection signals in this study using SNP genotype data of 14 indigenous sheep breeds from Middle East and South Asia, including six breeds from Iran, namely Iranian Balochi, Afshari, Moghani, Qezel, Zel, and Lori-Bakhtiari, three breeds from Afghanistan, namely Afghan Balochi, Arabi, and Gadik, three breeds from India, namely Indian Garole, Changthangi, and Deccani, and two breeds from Bangladesh, namely Bangladeshi Garole and Bangladesh East. The SNP genotype data were generated by the Illumina OvineSNP50 Genotyping BeadChip array. To detect genetic diversity and population structure, we used principal component analysis (PCA), admixture, phylogenetic analyses, and Runs of homozygosity. We applied four complementary statistical tests, FST (fixation index), xp-EHH (cross-population extended haplotype homozygosity), Rsb (extended haplotype homozygosity between-populations), and FLK (the extension of the Lewontin and Krakauer) to detect selective sweeps. Our results not only confirm the previous studies but also provide a suite of novel candidate genes involved in different traits in sheep. On average, FST, xp-EHH, Rsb, and FLK detected 128, 207, 222, and 252 genomic regions as candidates for selective sweeps, respectively. Furthermore, nine overlapping candidate genes were detected by these four tests, especially TNIK, DOCK1, USH2A, and TYW1B which associate with resistance to diseases and climate adaptation. Knowledge of candidate genomic regions in sheep populations may facilitate the identification and potential exploitation of the underlying genes in sheep breeding.
Collapse
Affiliation(s)
- Sirous Eydivandi
- Department of Animal Science, Behbahan Branch, Islamic Azad University, Behbahan, Iran.
- Center for Quantitative Genetics and Genomics, Faculty of Technical Sciences, Aarhus University, 8830, Tjele, Denmark.
| | - Mahmoud Amiri Roudbar
- Department of Animal Science, Safiabad-Dezful Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Dezful, Iran
| | - Mohammad Osman Karimi
- Department of Animal Science, Faculty of Agriculture, Herat University, Herat, Afghanistan
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Faculty of Technical Sciences, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|
4
|
Eydivandi S, Roudbar MA, Ardestani SS, Momen M, Sahana G. A selection signatures study among Middle Eastern and European sheep breeds. J Anim Breed Genet 2021; 138:574-588. [PMID: 33453096 DOI: 10.1111/jbg.12536] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 11/25/2020] [Accepted: 12/26/2020] [Indexed: 01/26/2023]
Abstract
Selection, both natural and artificial, leaves patterns on the genome during domestication of animals and leads to changes in allele frequencies among populations. Detecting genomic regions influenced by selection in livestock may assist in understanding the processes involved in genome evolution and discovering genomic regions related to traits of economic and ecological interests. In the current study, genetic diversity analyses were conducted on 34,206 quality-filtered SNP positions from 450 individuals in 15 sheep breeds, including six indigenous breeds from the Middle East, namely Iranian Balouchi, Afshari, Moghani, Qezel, Karakas and Norduz, and nine breeds from Europe, namely East Friesian Sheep, Ile de France, Mourerous, Romane, Swiss Mirror, Spaelsau, Suffolk, Comisana and Engadine Red Sheep. The SNP genotype data generated by the Illumina OvineSNP50 Genotyping BeadChip array were used in this analysis. We applied two complementary statistical analyses, FST (fixation index) and xp-EHH (cross-population extended haplotype homozygosity), to detect selection signatures in Middle Eastern and European sheep populations. FST and xp-EHH detected 629 and 256 genes indicating signatures of selection, respectively. Genomic regions identified using FST and xp-EHH contained the CIDEA, HHATL, MGST1, FADS1, RTL1 and DGKG genes, which were reported earlier to influence a number of economic traits. Both FST and xp-EHH approaches identified 60 shared genes as the signatures of selection, including four candidate genes (NT5E, ADA2, C8A and C8B) that were enriched for two significant Gene Ontology (GO) terms associated with the adenosine metabolic procedure. Knowledge about the candidate genomic regions under selective pressure in sheep breeds may facilitate identification of the underlying genes and enhance our understanding on these genes role in local adaptation.
Collapse
Affiliation(s)
- Sirous Eydivandi
- Department of Animal Science, Behbahan Branch, Islamic Azad University, Behbahan, Iran.,Faculty of Technical Sciences, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mahmoud Amiri Roudbar
- Department of Animal Science, Safiabad-Dezful Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education & Extension Organization (AREEO), Dezful, Iran
| | | | - Mehdi Momen
- Department of Surgical Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Goutam Sahana
- Faculty of Technical Sciences, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| |
Collapse
|
5
|
Khvorykh GV, Khrunin AV. imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters. BMC Bioinformatics 2020; 21:304. [PMID: 32703240 PMCID: PMC7379353 DOI: 10.1186/s12859-020-03589-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 06/08/2020] [Indexed: 11/26/2022] Open
Abstract
Background The imputation of genotypes increases the power of genome-wide association studies. However, the imputation quality should be assessed in each particular case. Nevertheless, not all imputation softwares control the error of output, e.g., the last release of fastPHASE program (1.4.8) lacks such an option. In this particular software there is also an uncertainty in choosing the model parameters. fastPHASE is based on haplotype clusters, which size should be set a priori. The parameter influences the results of imputation and downstream analysis. Results We present a software toolkit imputeqc to assess the imputation quality and/or to choose the model parameters for imputation. We demonstrate the efficacy of toolkit for evaluation of imputations made with both fastPHASE and BEAGLE software for HapMap and 1000 Genomes data. The discordance of genotypes received correlated well in both methods. Using imputeqc, we also shown how to choose the optimal number of haplotype clusters and expectation-maximization cycles for fastPHASE program. The found number of haplotype clusters of 25 was further applied for hapFLK testing that revealed signatures of selection at LCT region on chromosome 2. We also demonstrated how to decrease the computational time in the case of hapFLK testing from 3 days to 20 h. Conclusions The toolkit is implemented as an R package imputeqc and command line scripts. The code is freely available at https://github.com/inzilico/imputeqcunder the MIT license.
Collapse
Affiliation(s)
- Gennady V Khvorykh
- Department of Molecular Bases of Human Genetics, Institute of Molecular Genetics of Russian Academy of Sciences, 2 Kurchatov sq., Moscow, 123182, Russia.
| | - Andrey V Khrunin
- Department of Molecular Bases of Human Genetics, Institute of Molecular Genetics of Russian Academy of Sciences, 2 Kurchatov sq., Moscow, 123182, Russia
| |
Collapse
|
6
|
Theodoridis S, Randin C, Szövényi P, Boucher FC, Patsiou TS, Conti E. How Do Cold-Adapted Plants Respond to Climatic Cycles? Interglacial Expansion Explains Current Distribution and Genomic Diversity in Primula farinosa L. Syst Biol 2018; 66:715-736. [PMID: 28334079 DOI: 10.1093/sysbio/syw114] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 12/14/2016] [Indexed: 12/16/2022] Open
Abstract
Understanding the effects of past climatic fluctuations on the distribution and population-size dynamics of cold-adapted species is essential for predicting their responses to ongoing global climate change. In spite of the heterogeneity of cold-adapted species, two main contrasting hypotheses have been proposed to explain their responses to Late Quaternary glacial cycles, namely, the interglacial contraction versus the interglacial expansion hypotheses. Here, we use the cold-adapted plant Primula farinosa to test two demographic models under each of the two alternative hypotheses and a fifth, null model. We first approximate the time and extent of demographic contractions and expansions during the Late Quaternary by projecting species distribution models across the last 72 ka. We also generate genome-wide sequence data using a Reduced Representation Library approach to reconstruct the spatial structure, genetic diversity, and phylogenetic relationships of lineages within P. farinosa. Finally, by integrating the results of climatic and genomic analyses in an Approximate Bayesian Computation framework, we propose the most likely model for the extent and direction of population-size changes in $P$. farinosa through the Late Quaternary. Our results support the interglacial expansion of $P$. farinosa, differing from the prevailing paradigm that the observed distribution of cold-adapted species currently fragmented in high altitude and latitude regions reflects the consequences of postglacial contraction processes.
Collapse
Affiliation(s)
- Spyros Theodoridis
- Department of Systematic and Evolutionary Botany, University of Zurich, CH-8008 Zurich, Switzerland.,Zurich-Basel Plant Science Center, CH-8092 Zurich, Switzerland
| | - Christophe Randin
- Institute of Botany, University of Basel, CH-4056 Basel, Switzerland.,Department of Ecology & Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland
| | - Peter Szövényi
- Department of Systematic and Evolutionary Botany, University of Zurich, CH-8008 Zurich, Switzerland
| | - Florian C Boucher
- Department of Systematic and Evolutionary Botany, University of Zurich, CH-8008 Zurich, Switzerland.,Department of Botany and Zoology, University of Stellenbosch, 7602 Matieland, South Africa
| | - Theofania S Patsiou
- Department of Systematic and Evolutionary Botany, University of Zurich, CH-8008 Zurich, Switzerland.,Zurich-Basel Plant Science Center, CH-8092 Zurich, Switzerland.,Institute of Botany, University of Basel, CH-4056 Basel, Switzerland
| | - Elena Conti
- Department of Systematic and Evolutionary Botany, University of Zurich, CH-8008 Zurich, Switzerland.,Zurich-Basel Plant Science Center, CH-8092 Zurich, Switzerland
| |
Collapse
|
7
|
Louzoun Y, Alter I, Gragert L, Albrecht M, Maiers M. Modeling coverage gaps in haplotype frequencies via Bayesian inference to improve stem cell donor selection. Immunogenetics 2017; 70:279-292. [DOI: 10.1007/s00251-017-1040-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 10/23/2017] [Indexed: 11/24/2022]
|
8
|
Weir BS, Goudet J. A Unified Characterization of Population Structure and Relatedness. Genetics 2017; 206:2085-2103. [PMID: 28550018 PMCID: PMC5560808 DOI: 10.1534/genetics.116.198424] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 05/17/2017] [Indexed: 11/18/2022] Open
Abstract
Many population genetic activities, ranging from evolutionary studies to association mapping, to forensic identification, rely on appropriate estimates of population structure or relatedness. All applications require recognition that quantities with an underlying meaning of allelic dependence are not defined in an absolute sense, but instead are made "relative to" some set of alleles other than the target set. The 1984 Weir and Cockerham [Formula: see text] estimate made explicit that the reference set of alleles was across populations, whereas standard kinship estimates do not make the reference explicit. Weir and Cockerham stated that their [Formula: see text] estimates were for independent populations, and standard kinship estimates have an implicit assumption that pairs of individuals in a study sample, other than the target pair, are unrelated or are not inbred. However, populations lose independence when there is migration between them, and dependencies between pairs of individuals in a population exist for more than one target pair. We have therefore recast our treatments of population structure, relatedness, and inbreeding to make explicit that the parameters of interest involve the differences in degrees of allelic dependence between the target and the reference sets of alleles, and so can be negative. We take the reference set to be the population from which study individuals have been sampled. We provide simple moment estimates of these parameters, phrased in terms of allelic matching within and between individuals for relatedness and inbreeding, or within and between populations for population structure. A multi-level hierarchy of alleles within individuals, alleles between individuals within populations, and alleles between populations, allows a unified treatment of relatedness and population structure. We expect our new measures to have a wide range of applications, but we note that their estimates are sensitive to rare or private variants: some population-characterization applications suggest exploiting those sensitivities, whereas estimation of relatedness may best use all genetic markers without filtering on minor allele frequency.
Collapse
Affiliation(s)
- Bruce S Weir
- Department of Biostatistics, University of Washington, Seattle, Washington 98195
| | - Jérôme Goudet
- Department of Ecology and Evolution
- Swiss Institute of Bioinformatics, University of Lausanne, 1015 Switzerland
| |
Collapse
|
9
|
Roshyara NR, Scholz M. Impact of genetic similarity on imputation accuracy. BMC Genet 2015; 16:90. [PMID: 26193934 PMCID: PMC4509609 DOI: 10.1186/s12863-015-0248-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Accepted: 07/07/2015] [Indexed: 01/06/2023] Open
Abstract
Background Genotype imputation is a common technique in genetic research. Genetic similarity between target population and reference dataset is crucial for high-quality results. Although several reference panels are available, it is often not clear which is the most optimal for a particular target dataset to be imputed. Maximizing genetic similarity between study sample and intended reference panels may be the straight forward method for selecting the genetically best-matched reference. However, the impact of genetic similarity on imputation accuracy has not yet been studied in detail. Results We performed a simulation study in 20 ethnic groups obtained from POPRES. High-quality SNPs were masked and re-imputed with MaCH, MaCH-minimac and IMPUTE2 using four different HapMap reference panels (CEU, CHB-JPT, MEX and YRI). Imputation accuracy was assessed by different statistics. Genetic similarity between ethnic groups and reference populations were measured by F -statistics (FST) originally proposed by Wright and G -statistics (GST) introduced by Nei and others. To assess the predictive power of these measures regarding imputation accuracy, we analysed relations between them and corresponding imputation accuracy scores. We found that population genetic distances between homogeneous reference and target populations were strongly linearly correlated with resulting imputation accuracies irrespective of considered distance measure, imputation accuracy measure, missingness and imputation software used. Possible exception was African population. Conclusion Usage of GST or FST-related measures for predicting the optimal reference panel for imputation frameworks relying on a specific reference is highly recommended. A cut-off of GST < 0.01 is recommended to achieve good imputation results for high-frequency variants and small data sets. The linear relationship is less pronounced for low-frequency variants for which we also observed a dependence of imputation accuracy on the number of polymorphic sites in the reference. We also show that the software specific measures MaCH-Rsq and IMPUTE-info must be interpreted with caution if the genetic distance of target and reference population is high. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0248-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nab Raj Roshyara
- Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Haertelstrasse 16-18, 04107, Leipzig, Germany. .,LIFE Center (Leipzig Interdisciplinary Research Cluster of Genetic Factors, Phenotypes and Environment), University of Leipzig, Philipp-Rosenthal Strasse 27, 04103, Leipzig, Germany.
| | - Markus Scholz
- Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Haertelstrasse 16-18, 04107, Leipzig, Germany. .,LIFE Center (Leipzig Interdisciplinary Research Cluster of Genetic Factors, Phenotypes and Environment), University of Leipzig, Philipp-Rosenthal Strasse 27, 04103, Leipzig, Germany.
| |
Collapse
|
10
|
Gholami M, Reimer C, Erbe M, Preisinger R, Weigend A, Weigend S, Servin B, Simianer H. Genome Scan for Selection in Structured Layer Chicken Populations Exploiting Linkage Disequilibrium Information. PLoS One 2015; 10:e0130497. [PMID: 26151449 PMCID: PMC4494984 DOI: 10.1371/journal.pone.0130497] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 05/20/2015] [Indexed: 01/02/2023] Open
Abstract
An increasing interest is being placed in the detection of genes, or genomic regions, that have been targeted by selection because identifying signatures of selection can lead to a better understanding of genotype-phenotype relationships. A common strategy for the detection of selection signatures is to compare samples from distinct populations and to search for genomic regions with outstanding genetic differentiation. The aim of this study was to detect selective signatures in layer chicken populations using a recently proposed approach, hapFLK, which exploits linkage disequilibrium information while accounting appropriately for the hierarchical structure of populations. We performed the analysis on 70 individuals from three commercial layer breeds (White Leghorn, White Rock and Rhode Island Red), genotyped for approximately 1 million SNPs. We found a total of 41 and 107 regions with outstanding differentiation or similarity using hapFLK and its single SNP counterpart FLK respectively. Annotation of selection signature regions revealed various genes and QTL corresponding to productions traits, for which layer breeds were selected. A number of the detected genes were associated with growth and carcass traits, including IGF-1R, AGRP and STAT5B. We also annotated an interesting gene associated with the dark brown feather color mutational phenotype in chickens (SOX10). We compared FST, FLK and hapFLK and demonstrated that exploiting linkage disequilibrium information and accounting for hierarchical population structure decreased the false detection rate.
Collapse
Affiliation(s)
- Mahmood Gholami
- Animal Breeding and Genetics Group, Department of Animal Sciences, Georg-August-University Göttingen, Göttingen, Germany
- * E-mail:
| | - Christian Reimer
- Animal Breeding and Genetics Group, Department of Animal Sciences, Georg-August-University Göttingen, Göttingen, Germany
| | - Malena Erbe
- Animal Breeding and Genetics Group, Department of Animal Sciences, Georg-August-University Göttingen, Göttingen, Germany
| | | | - Annett Weigend
- Institute of Farm Animal Genetics (ING), Friedrich-Loeffler-Institut (FLI), Neustadt, Germany
| | - Steffen Weigend
- Institute of Farm Animal Genetics (ING), Friedrich-Loeffler-Institut (FLI), Neustadt, Germany
| | - Bertrand Servin
- Laboratoire Génétique, Physiologie et Systèmes d’Elevage, Institut National de la Recherche Agronomique, Castanet-Tolosan, France
| | - Henner Simianer
- Animal Breeding and Genetics Group, Department of Animal Sciences, Georg-August-University Göttingen, Göttingen, Germany
| |
Collapse
|
11
|
|
12
|
Randhawa IAS, Khatkar MS, Thomson PC, Raadsma HW. Composite selection signals can localize the trait specific genomic regions in multi-breed populations of cattle and sheep. BMC Genet 2014; 15:34. [PMID: 24636660 PMCID: PMC4101850 DOI: 10.1186/1471-2156-15-34] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Accepted: 03/10/2014] [Indexed: 12/22/2022] Open
Abstract
Background Discerning the traits evolving under neutral conditions from those traits evolving rapidly because of various selection pressures is a great challenge. We propose a new method, composite selection signals (CSS), which unifies the multiple pieces of selection evidence from the rank distribution of its diverse constituent tests. The extreme CSS scores capture highly differentiated loci and underlying common variants hauling excess haplotype homozygosity in the samples of a target population. Results The data on high-density genotypes were analyzed for evidence of an association with either polledness or double muscling in various cohorts of cattle and sheep. In cattle, extreme CSS scores were found in the candidate regions on autosome BTA-1 and BTA-2, flanking the POLL locus and MSTN gene, for polledness and double muscling, respectively. In sheep, the regions with extreme scores were localized on autosome OAR-2 harbouring the MSTN gene for double muscling and on OAR-10 harbouring the RXFP2 gene for polledness. In comparison to the constituent tests, there was a partial agreement between the signals at the four candidate loci; however, they consistently identified additional genomic regions harbouring no known genes. Persuasively, our list of all the additional significant CSS regions contains genes that have been successfully implicated to secondary phenotypic diversity among several subpopulations in our data. For example, the method identified a strong selection signature for stature in cattle capturing selective sweeps harbouring UQCC-GDF5 and PLAG1-CHCHD7 gene regions on BTA-13 and BTA-14, respectively. Both gene pairs have been previously associated with height in humans, while PLAG1-CHCHD7 has also been reported for stature in cattle. In the additional analysis, CSS identified significant regions harbouring multiple genes for various traits under selection in European cattle including polledness, adaptation, metabolism, growth rate, stature, immunity, reproduction traits and some other candidate genes for dairy and beef production. Conclusions CSS successfully localized the candidate regions in validation datasets as well as identified previously known and novel regions for various traits experiencing selection pressure. Together, the results demonstrate the utility of CSS by its improved power, reduced false positives and high-resolution of selection signals as compared to individual constituent tests.
Collapse
Affiliation(s)
- Imtiaz Ahmed Sajid Randhawa
- ReproGen - Animal Bioscience Group, Faculty of Veterinary Science, University of Sydney, 425 Werombi Road, Camden NSW 2570, Australia.
| | | | | | | |
Collapse
|
13
|
Abstract
The recent advent of high-throughput sequencing and genotyping technologies makes it possible to produce, easily and cost effectively, large amounts of detailed data on the genotype composition of populations. Detecting locus-specific effects may help identify those genes that have been, or are currently, targeted by natural selection. How best to identify these selected regions, loci, or single nucleotides remains a challenging issue. Here, we introduce a new model-based method, called SelEstim, to distinguish putative selected polymorphisms from the background of neutral (or nearly neutral) ones and to estimate the intensity of selection at the former. The underlying population genetic model is a diffusion approximation for the distribution of allele frequency in a population subdivided into a number of demes that exchange migrants. We use a Markov chain Monte Carlo algorithm for sampling from the joint posterior distribution of the model parameters, in a hierarchical Bayesian framework. We present evidence from stochastic simulations, which demonstrates the good power of SelEstim to identify loci targeted by selection and to estimate the strength of selection acting on these loci, within each deme. We also reanalyze a subset of SNP data from the Stanford HGDP-CEPH Human Genome Diversity Cell Line Panel to illustrate the performance of SelEstim on real data. In agreement with previous studies, our analyses point to a very strong signal of positive selection upstream of the LCT gene, which encodes for the enzyme lactase-phlorizin hydrolase and is associated with adult-type hypolactasia. The geographical distribution of the strength of positive selection across the Old World matches the interpolated map of lactase persistence phenotype frequencies, with the strongest selection coefficients in Europe and in the Indus Valley.
Collapse
|
14
|
Detecting signatures of selection through haplotype differentiation among hierarchically structured populations. Genetics 2013; 193:929-41. [PMID: 23307896 DOI: 10.1534/genetics.112.147231] [Citation(s) in RCA: 208] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The detection of molecular signatures of selection is one of the major concerns of modern population genetics. A widely used strategy in this context is to compare samples from several populations and to look for genomic regions with outstanding genetic differentiation between these populations. Genetic differentiation is generally based on allele frequency differences between populations, which are measured by FST or related statistics. Here we introduce a new statistic, denoted hapFLK, which focuses instead on the differences of haplotype frequencies between populations. In contrast to most existing statistics, hapFLK accounts for the hierarchical structure of the sampled populations. Using computer simulations, we show that each of these two features-the use of haplotype information and of the hierarchical structure of populations-significantly improves the detection power of selected loci and that combining them in the hapFLK statistic provides even greater power. We also show that hapFLK is robust with respect to bottlenecks and migration and improves over existing approaches in many situations. Finally, we apply hapFLK to a set of six sheep breeds from Northern Europe and identify seven regions under selection, which include already reported regions but also several new ones. We propose a method to help identifying the population(s) under selection in a detected region, which reveals that in many of these regions selection most likely occurred in more than one population. Furthermore, several of the detected regions correspond to incomplete sweeps, where the favorable haplotype is only at intermediate frequency in the population(s) under selection.
Collapse
|
15
|
Schlebusch CM, Soodyall H. Extensive Population Structure in San, Khoe, and Mixed Ancestry Populations from Southern Africa Revealed by 44 Short 5-SNP Haplotypes. Hum Biol 2012; 84:695-724. [DOI: 10.3378/027.084.0603] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2013] [Indexed: 11/05/2022]
|
16
|
Abstract
Characterizing the genetic structure of populations is of importance to evolutionary biology, to human disease gene mapping and to forensic science. Sewall Wright introduced a set of "F-statistics" to describe population structure in 1951 and he emphasized that these quantities were ratios of variances. Responding to uncertainty over the best way to estimate F-statistics, Weir and Cockerham published a method-of-moments set of estimators in 1984 (Evolution 38:1358-1370). This paper continues to be widely cited, with over 7,000 citations to date. Some background to the publishing history of the Weir and Ccckerham paper is given here, along with subsequent developments and a discussion of current uses of Wright's F-statistics.
Collapse
Affiliation(s)
- Bruce S Weir
- Department of Biostatisitics, University of Washington Box 357232, Seattle WA 98195-7232 ;
| |
Collapse
|
17
|
Salm MPA, Horswell SD, Hutchison CE, Speedy HE, Yang X, Liang L, Schadt EE, Cookson WO, Wierzbicki AS, Naoumova RP, Shoulders CC. The origin, global distribution, and functional impact of the human 8p23 inversion polymorphism. Genome Res 2012; 22:1144-53. [PMID: 22399572 PMCID: PMC3371712 DOI: 10.1101/gr.126037.111] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Genomic inversions are an increasingly recognized source of genetic variation. However, a lack of reliable high-throughput genotyping assays for these structures has precluded a full understanding of an inversion's phylogenetic, phenotypic, and population genetic properties. We characterize these properties for one of the largest polymorphic inversions in man (the ∼4.5-Mb 8p23.1 inversion), a structure that encompasses numerous signals of natural selection and disease association. We developed and validated a flexible bioinformatics tool that utilizes SNP data to enable accurate, high-throughput genotyping of the 8p23.1 inversion. This tool was applied retrospectively to diverse genome-wide data sets, revealing significant population stratification that largely follows a clinal “serial founder effect” distribution model. Phylogenetic analyses establish the inversion's ancestral origin within the Homo lineage, indicating that 8p23.1 inversion has occurred independently in the Pan lineage. The human inversion breakpoint was localized to an inverted pair of human endogenous retrovirus elements within the large, flanking low-copy repeats; experimental validation of this breakpoint confirmed these elements as the likely intermediary substrates that sponsored inversion formation. In five data sets, mRNA levels of disease-associated genes were robustly associated with inversion genotype. Moreover, a haplotype associated with systemic lupus erythematosus was restricted to the derived inversion state. We conclude that the 8p23.1 inversion is an evolutionarily dynamic structure that can now be accommodated into the understanding of human genetic and phenotypic diversity.
Collapse
Affiliation(s)
- Maximilian P A Salm
- Centre for Endocrinology, Barts & the London School of Medicine & Dentistry, Queen Mary University of London, London, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Lenstra JA, Groeneveld LF, Eding H, Kantanen J, Williams JL, Taberlet P, Nicolazzi EL, Sölkner J, Simianer H, Ciani E, Garcia JF, Bruford MW, Ajmone-Marsan P, Weigend S. Molecular tools and analytical approaches for the characterization of farm animal genetic diversity. Anim Genet 2012; 43:483-502. [DOI: 10.1111/j.1365-2052.2011.02309.x] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2011] [Indexed: 12/30/2022]
Affiliation(s)
- J. A. Lenstra
- Faculty of Veterinary Medicine; Utrecht University; Utrecht; The Netherlands
| | - L. F. Groeneveld
- Institute of Farm Animal Genetics; Friedrich-Loeffler-Institut; Hoeltystr. 10; 31535; Neustadt; Germany
| | - H. Eding
- Animal Evaluations Unit; CRV; Arnhem; The Netherlands
| | - J. Kantanen
- Biotechnology and Food Research; MTT Agrifood Research Finland; FI-31600; Jokioinen; Finland
| | - J. L. Williams
- Parco Tecnologico Padano; via Einstein; 2600; Lodi; Italy
| | - P. Taberlet
- Laboratoire d'Ecologie Alpine; Université Joseph Fourier; BP 53; Grenoble; France
| | - E. L. Nicolazzi
- Istituto di Zootecnica and BioDNA Research Centre; Università Cattolica del Sacro Cuore; Piacenza; Italy
| | - J. Sölkner
- Department of Sustainable Agricultural Systems; Animal Breeding Group; BOKU - University of Natural Resources and Life Sciences; Vienna; Austria
| | - H. Simianer
- Department of Animal Sciences; Animal Breeding and Genetics Group; Georg-August-University Göttingen; 37075; Göttingen; Germany
| | - E. Ciani
- Department of General and Environmental Physiology; University of Bari “Aldo Moro”; Bari; Italy
| | - J. F. Garcia
- Universidade Estadual Paulista; Araçatuba; Brazil
| | - M. W. Bruford
- Organisms and Environment Division; School of Biosciences; Cardiff University; Cardiff; UK
| | - P. Ajmone-Marsan
- Istituto di Zootecnica and BioDNA Research Centre; Università Cattolica del Sacro Cuore; Piacenza; Italy
| | - S. Weigend
- Institute of Farm Animal Genetics; Friedrich-Loeffler-Institut; Hoeltystr. 10; 31535; Neustadt; Germany
| |
Collapse
|
19
|
Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genet 2012; 8:e1002453. [PMID: 22291602 PMCID: PMC3266881 DOI: 10.1371/journal.pgen.1002453] [Citation(s) in RCA: 703] [Impact Index Per Article: 58.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Accepted: 11/21/2011] [Indexed: 12/12/2022] Open
Abstract
The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.
Collapse
Affiliation(s)
- Daniel John Lawson
- Department of Mathematics, University of Bristol, Bristol, United Kingdom
| | | | - Simon Myers
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Daniel Falush
- Environmental Research Institute, University College Cork, Cork, Ireland
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| |
Collapse
|
20
|
San Lucas FA, Rosenberg NA, Scheet P. Haploscope: a tool for the graphical display of haplotype structure in populations. Genet Epidemiol 2011; 36:17-21. [PMID: 22147662 DOI: 10.1002/gepi.20640] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Revised: 09/14/2011] [Accepted: 09/21/2011] [Indexed: 11/11/2022]
Abstract
Patterns of linkage disequilibrium are often depicted pictorially by using tools that rely on visualizations of raw data or pairwise correlations among individual markers. Such approaches can fail to highlight some of the more interesting and complex features of haplotype structure. To enable natural visual comparisons of haplotype structure across subgroups of a population (e.g. isolated subpopulations or cases and controls), we propose an alternative visualization that provides a novel graphical representation of haplotype frequencies. We introduce Haploscope, a tool for visualizing the haplotype cluster frequencies that are produced by statistical models for population haplotype variation. We demonstrate the utility of our technique by examining haplotypes around the LCT gene, an example of recent positive selection, in samples from the Human Genome Diversity Panel. Haploscope, which has flexible options for annotation and inspection of haplotypes, is available for download at http://scheet.org/software.
Collapse
Affiliation(s)
- F Anthony San Lucas
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA.
| | | | | |
Collapse
|
21
|
Allendorf FW, Hohenlohe PA, Luikart G. Genomics and the future of conservation genetics. Nat Rev Genet 2010; 11:697-709. [DOI: 10.1038/nrg2844] [Citation(s) in RCA: 939] [Impact Index Per Article: 67.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|