1
|
Ye Z, Wei W, Pfrender ME, Lynch M. Evolutionary Insights from a Large-Scale Survey of Population-Genomic Variation. Mol Biol Evol 2023; 40:msad233. [PMID: 37863047 PMCID: PMC10630549 DOI: 10.1093/molbev/msad233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/11/2023] [Accepted: 10/03/2023] [Indexed: 10/22/2023] Open
Abstract
The field of genomics has ushered in new methods for studying molecular-genetic variation in natural populations. However, most population-genomic studies still rely on small sample sizes (typically, <100 individuals) from single time points, leaving considerable uncertainties with respect to the behavior of relatively young (and rare) alleles and, owing to the large sampling variance of measures of variation, to the specific gene targets of unusually strong selection. Genomic sequences of ∼1,700 haplotypes distributed over a 10-year period from a natural population of the microcrustacean Daphnia pulex reveal evolutionary-genomic features at a refined scale, including previously hidden information on the behavior of rare alleles predicted by recent theory. Background selection, resulting from the recurrent introduction of deleterious alleles, appears to strongly influence the dynamics of neutral alleles, inducing indirect negative selection on rare variants and positive selection on common variants. Temporally fluctuating selection increases the persistence of nonsynonymous alleles with intermediate frequencies, while reducing standing levels of variation at linked silent sites. Combined with the results from an equally large metapopulation survey of the study species, classes of genes that are under strong positive selection can now be confidently identified in this key model organism. Most notable among rapidly evolving Daphnia genes are those associated with ribosomes, mitochondrial functions, sensory systems, and lifespan determination.
Collapse
Affiliation(s)
- Zhiqiang Ye
- Hubei Key Laboratory of Genetic Regulation & Integrative Biology, School of Life Sciences, Central China Normal University, Wuhan 430079, China
| | - Wen Wei
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| | - Michael E Pfrender
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
2
|
Ye Z, Wei W, Pfrender M, Lynch M. Evolutionary Insights from a Large-scale Survey of Population-genomic Variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.03.539276. [PMID: 37205430 PMCID: PMC10187179 DOI: 10.1101/2023.05.03.539276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Results from data on > 1000 haplotypes distributed over a nine-year period from a natural population of the microcrustacean Daphnia pulex reveal evolutionary-genomic features at a refined scale, including key population-genetic properties that are obscured in studies with smaller sample sizes. Background selection, resulting from the recurrent introduction of deleterious alleles, appears to strongly influence the dynamics of neutral alleles, inducing indirect negative selection on rare variants and positive selection on common variants. Fluctuating selection increases the persistence of nonsynonymous alleles with intermediate frequencies, while reducing standing levels of variation at linked silent sites. Combined with the results from an equally large metapopulation survey of the study species, regions of gene structure that are under strong purifying selection and classes of genes that are under strong positive selection in this key species can be confidently identified. Most notable among rapidly evolving Daphnia genes are those associated with ribosomes, mitochondrial functions, sensory systems, and lifespan determination.
Collapse
Affiliation(s)
- Zhiqiang Ye
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Wen Wei
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Michael Pfrender
- Department of Biological Sciences, Notre Dame University, Notre Dame, IN 46556
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| |
Collapse
|
3
|
Chen Y, Niu S, Deng X, Song Q, He L, Bai D, He Y. Genome-wide association study of leaf-related traits in tea plant in Guizhou based on genotyping-by-sequencing. BMC PLANT BIOLOGY 2023; 23:196. [PMID: 37046207 PMCID: PMC10091845 DOI: 10.1186/s12870-023-04192-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Accepted: 03/24/2023] [Indexed: 06/19/2023]
Abstract
BACKGROUND Studying the genetic characteristics of tea plant (Camellia spp.) leaf traits is essential for improving yield and quality through breeding and selection. Guizhou Plateau, an important part of the original center of tea plants, has rich genetic resources. However, few studies have explored the associations between tea plant leaf traits and single nucleotide polymorphism (SNP) markers in Guizhou. RESULTS In this study, we used the genotyping-by-sequencing (GBS) method to identify 100,829 SNP markers from 338 accessions of tea germplasm in Guizhou Plateau, a region with rich genetic resources. We assessed population structure based on high-quality SNPs, constructed phylogenetic relationships, and performed genome-wide association studies (GWASs). Four inferred pure groups (G-I, G-II, G-III, and G-IV) and one inferred admixture group (G-V), were identified by a population structure analysis, and verified by principal component analyses and phylogenetic analyses. Through GWAS, we identified six candidate genes associated with four leaf traits, including mature leaf size, texture, color and shape. Specifically, two candidate genes, located on chromosomes 1 and 9, were significantly associated with mature leaf size, while two genes, located on chromosomes 8 and 11, were significantly associated with mature leaf texture. Additionally, two candidate genes, located on chromosomes 1 and 2 were identified as being associated with mature leaf color and mature leaf shape, respectively. We verified the expression level of two candidate genes was verified using reverse transcription quantitative polymerase chain reaction (RT-qPCR) and designed a derived cleaved amplified polymorphism (dCAPS) marker that co-segregated with mature leaf size, which could be used for marker-assisted selection (MAS) breeding in Camellia sinensis. CONCLUSIONS In the present study, by using GWAS approaches with the 338 tea accessions population in Guizhou, we revealed a list of SNPs markers and candidate genes that were significantly associated with four leaf traits. This work provides theoretical and practical basis for the genetic breeding of related traits in tea plant leaves.
Collapse
Affiliation(s)
- Yanjun Chen
- College of Tea Science / Tea Engineering Technology Research Center, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
| | - Suzhen Niu
- College of Tea Science / Tea Engineering Technology Research Center, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
- Key Laboratory of Plant Resources Conservation and Germplasm Innovation in Mountainous Region, Ministry of Education, Institute of Agro-Bioengineering, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
| | - Xinyue Deng
- School of Architecture, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
| | - Qinfei Song
- College of Tea Science / Tea Engineering Technology Research Center, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
| | - Limin He
- College of Tea Science / Tea Engineering Technology Research Center, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
| | - Dingchen Bai
- College of Tea Science / Tea Engineering Technology Research Center, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
| | - Yingqin He
- College of Tea Science / Tea Engineering Technology Research Center, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
| |
Collapse
|
4
|
Lynch M, Ye Z, Urban L, Maruki T, Wei W. The Linkage-Disequilibrium and Recombinational Landscape in Daphnia pulex. Genome Biol Evol 2022; 14:evac145. [PMID: 36170345 PMCID: PMC9642108 DOI: 10.1093/gbe/evac145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/20/2022] [Indexed: 11/24/2022] Open
Abstract
By revealing the influence of recombinational activity beyond what can be achieved with controlled crosses, measures of linkage disequilibrium (LD) in natural populations provide a powerful means of defining the recombinational landscape within which genes evolve. In one of the most comprehensive studies of this sort ever performed, involving whole-genome analyses on nearly 1,000 individuals of the cyclically parthenogenetic microcrustacean Daphnia pulex, the data suggest a relatively uniform pattern of recombination across the genome. Patterns of LD are quite consistent among populations; average rates of recombination are quite similar for all chromosomes; and although some chromosomal regions have elevated recombination rates, the degree of inflation is not large, and the overall spatial pattern of recombination is close to the random expectation. Contrary to expectations for models in which crossing-over is the primary mechanism of recombination, and consistent with data for other species, the distance-dependent pattern of LD indicates excessively high levels at both short and long distances and unexpectedly low levels of decay at long distances, suggesting significant roles for factors such as nonindependent mutation, population subdivision, and recombination mechanisms unassociated with crossing over. These observations raise issues regarding the classical LD equilibrium model widely applied in population genetics to infer recombination rates across various length scales on chromosomes.
Collapse
Affiliation(s)
- Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| | - Zhiqiang Ye
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| | - Lina Urban
- Department for Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Takahiro Maruki
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| | - Wen Wei
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
5
|
Gerard D. Scalable bias-corrected linkage disequilibrium estimation under genotype uncertainty. Heredity (Edinb) 2021; 127:357-362. [PMID: 34373594 PMCID: PMC8479074 DOI: 10.1038/s41437-021-00462-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 07/19/2021] [Accepted: 07/19/2021] [Indexed: 02/07/2023] Open
Abstract
Linkage disequilibrium (LD) estimates are often calculated genome-wide for use in many tasks, such as SNP pruning and LD decay estimation. However, in the presence of genotype uncertainty, naive approaches to calculating LD have extreme attenuation biases, incorrectly suggesting that SNPs are less dependent than in reality. These biases are particularly strong in polyploid organisms, which often exhibit greater levels of genotype uncertainty than diploids. A principled approach using maximum likelihood estimation with genotype likelihoods can reduce this bias, but is prohibitively slow for genome-wide applications. Here, we present scalable moment-based adjustments to LD estimates based on the marginal posterior distributions of the genotypes. We demonstrate, on both simulated and real data, that these moment-based estimators are as accurate as maximum likelihood estimators, but are almost as fast as naive approaches based only on posterior mean genotypes. This opens up bias-corrected LD estimation to genome-wide applications. In addition, we provide standard errors for these moment-based estimators. All methods discussed in this manuscript are implemented in the ldsep package, available on the Comprehensive R Archive Network ( https://cran.r-project.org/package=ldsep ).
Collapse
Affiliation(s)
- David Gerard
- Department of Mathematics and Statistics, American University, Washington, DC, USA.
| |
Collapse
|
6
|
Lou RN, Jacobs A, Wilder A, Therkildsen NO. A beginner's guide to low-coverage whole genome sequencing for population genomics. Mol Ecol 2021; 30:5966-5993. [PMID: 34250668 DOI: 10.1111/mec.16077] [Citation(s) in RCA: 101] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 06/30/2021] [Accepted: 07/01/2021] [Indexed: 11/26/2022]
Abstract
Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency and genetic diversity estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.
Collapse
Affiliation(s)
- Runyang Nicolas Lou
- Department of Natural Resources and the Environment, Cornell University, Ithaca, NY, 14853, USA
| | - Arne Jacobs
- Department of Natural Resources and the Environment, Cornell University, Ithaca, NY, 14853, USA.,Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Aryn Wilder
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027, USA
| | - Nina O Therkildsen
- Department of Natural Resources and the Environment, Cornell University, Ithaca, NY, 14853, USA
| |
Collapse
|
7
|
Gerard D. Pairwise linkage disequilibrium estimation for polyploids. Mol Ecol Resour 2021; 21:1230-1242. [PMID: 33559321 DOI: 10.1111/1755-0998.13349] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 01/18/2021] [Accepted: 02/01/2021] [Indexed: 12/31/2022]
Abstract
Many tasks in statistical genetics involve pairwise estimation of linkage disequilibrium (LD). The study of LD in diploids is mature. However, in polyploids, the field lacks a comprehensive characterization of LD. Polyploids also exhibit greater levels of genotype uncertainty than diploids, yet no methods currently exist to estimate LD in polyploids in the presence of such genotype uncertainty. Furthermore, most LD estimation methods do not quantify the level of uncertainty in their LD estimates. Our study contains three major contributions. (i) We characterize haplotypic and composite measures of LD in polyploids. These composite measures of LD turn out to be functions of common statistical measures of association. (ii) We derive procedures to estimate haplotypic and composite LD in polyploids in the presence of genotype uncertainty. We do this by estimating LD directly from genotype likelihoods, which may be obtained from many genotyping platforms. (iii) We derive standard errors of all LD estimators that we discuss. We validate our methods on both real and simulated data. Our methods are implemented in the R package ldsep, available on the Comprehensive R Archive Network https://cran.r-project.org/package=ldsep.
Collapse
Affiliation(s)
- David Gerard
- Department of Mathematics and Statistics, American University, Washington, DC, USA
| |
Collapse
|
8
|
Barbanti A, Torrado H, Macpherson E, Bargelloni L, Franch R, Carreras C, Pascual M. Helping decision making for reliable and cost-effective 2b-RAD sequencing and genotyping analyses in non-model species. Mol Ecol Resour 2020; 20. [PMID: 32061018 DOI: 10.1111/1755-0998.13144] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 02/04/2020] [Accepted: 02/10/2020] [Indexed: 12/18/2022]
Abstract
High-throughput sequencing has revolutionized population and conservation genetics. RAD sequencing methods, such as 2b-RAD, can be used on species lacking a reference genome. However, transferring protocols across taxa can potentially lead to poor results. We tested two different IIB enzymes (AlfI and CspCI) on two species with different genome sizes (the loggerhead turtle Caretta caretta and the sharpsnout seabream Diplodus puntazzo) to build a set of guidelines to improve 2b-RAD protocols on non-model organisms while optimising costs. Good results were obtained even with degraded samples, showing the value of 2b-RAD in studies with poor DNA quality. However, library quality was found to be a critical parameter on the number of reads and loci obtained for genotyping. Resampling analyses with different number of reads per individual showed a trade-off between number of loci and number of reads per sample. The resulting accumulation curves can be used as a tool to calculate the number of sequences per individual needed to reach a mean depth ≥20 reads to acquire good genotyping results. Finally, we demonstrated that selective-base ligation does not affect genomic differentiation between individuals, indicating that this technique can be used in species with large genome sizes to adjust the number of loci to the study scope, to reduce sequencing costs and to maintain suitable sequencing depth for a reliable genotyping without compromising the results. Here, we provide a set of guidelines to improve 2b-RAD protocols on non-model organisms with different genome sizes, helping decision-making for a reliable and cost-effective genotyping.
Collapse
Affiliation(s)
- Anna Barbanti
- Department of Genetics, Microbiology and Statistics and IRBio, University of Barcelona, Barcelona, Spain
| | - Hector Torrado
- Department of Genetics, Microbiology and Statistics and IRBio, University of Barcelona, Barcelona, Spain.,Center for Advanced Studies of Blanes (CEAB-CSIC), Blanes, Girona, Spain
| | - Enrique Macpherson
- Center for Advanced Studies of Blanes (CEAB-CSIC), Blanes, Girona, Spain
| | - Luca Bargelloni
- Department of Comparative Biomedicine and Food Science, University of Padova, Legnaro, Italy
| | - Rafaella Franch
- Department of Comparative Biomedicine and Food Science, University of Padova, Legnaro, Italy
| | - Carlos Carreras
- Department of Genetics, Microbiology and Statistics and IRBio, University of Barcelona, Barcelona, Spain
| | - Marta Pascual
- Department of Genetics, Microbiology and Statistics and IRBio, University of Barcelona, Barcelona, Spain
| |
Collapse
|
9
|
Qanbari S. On the Extent of Linkage Disequilibrium in the Genome of Farm Animals. Front Genet 2020; 10:1304. [PMID: 32010183 PMCID: PMC6978288 DOI: 10.3389/fgene.2019.01304] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 11/26/2019] [Indexed: 11/13/2022] Open
Abstract
Given the importance of linkage disequilibrium (LD) in gene mapping and evolutionary inferences, I characterize in this review the pattern of LD and discuss the influence of human intervention during domestication, breed establishment, and subsequent genetic improvement on shaping the genome of livestock species. To this end, I summarize data on the profile of LD based on array genotypes vs. sequencing data in cattle and chicken, two major livestock species, and compare to the human case. This comparison provides insights into the real dimension of the pairwise allelic correlation and haplo-block structuring. The dependency of LD on allelic frequency is pictured and a recently introduced metric for moderating it is outlined. In the context of the contact farm animals had with human, the impact of genetic forces including admixture, mutation, recombination rate, selection, and effective population size on LD is discussed. The review further highlights the interplay of LD with runs of homozygosity and concludes with the operational implications of the widely used association and selection mapping studies in relation to LD.
Collapse
Affiliation(s)
- Saber Qanbari
- Leibniz Institute for Farm Animal Biology (FBN), Institute of Genetics and Biometry, Dummerstorf, Germany.,Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Göttingen, Göttingen, Germany
| |
Collapse
|
10
|
Therkildsen NO, Wilder AP, Conover DO, Munch SB, Baumann H, Palumbi SR. Contrasting genomic shifts underlie parallel phenotypic evolution in response to fishing. Science 2019; 365:487-490. [DOI: 10.1126/science.aaw7271] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 06/07/2019] [Indexed: 12/16/2022]
Abstract
Humans cause widespread evolutionary change in nature, but we still know little about the genomic basis of rapid adaptation in the Anthropocene. We tracked genomic changes across all protein-coding genes in experimental fish populations that evolved pronounced shifts in growth rates due to size-selective harvest over only four generations. Comparisons of replicate lines show parallel allele frequency shifts that recapitulate responses to size-selection gradients in the wild across hundreds of unlinked variants concentrated in growth-related genes. However, a supercluster of genes also rose rapidly in frequency and dominated the evolutionary dynamic in one replicate line but not in others. Parallel phenotypic changes thus masked highly divergent genomic responses to selection, illustrating how contingent rapid adaptation can be in the face of strong human-induced selection.
Collapse
|
11
|
Niu S, Song Q, Koiwa H, Qiao D, Zhao D, Chen Z, Liu X, Wen X. Genetic diversity, linkage disequilibrium, and population structure analysis of the tea plant (Camellia sinensis) from an origin center, Guizhou plateau, using genome-wide SNPs developed by genotyping-by-sequencing. BMC PLANT BIOLOGY 2019; 19:328. [PMID: 31337341 PMCID: PMC6652003 DOI: 10.1186/s12870-019-1917-5] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Accepted: 07/02/2019] [Indexed: 05/19/2023]
Abstract
BACKGROUND To efficiently protect and exploit germplasm resources for marker development and breeding purposes, we must accurately depict the features of the tea populations. This study focuses on the Camellia sinensis (C. sinensis) population and aims to (i) identify single nucleotide polymorphisms (SNPs) on the genome level, (ii) investigate the genetic diversity and population structure, and (iii) characterize the linkage disequilibrium (LD) pattern to facilitate next genome-wide association mapping and marker-assisted selection. RESULTS We collected 415 tea accessions from the Origin Center and analyzed the genetic diversity, population structure and LD pattern using the genotyping-by-sequencing (GBS) approach. A total of 79,016 high-quality SNPs were identified; the polymorphism information content (PIC) and genetic diversity (GD) based on these SNPs showed a higher level of genetic diversity in cultivated type than in wild type. The 415 accessions were clustered into three groups by STRUCTURE software and confirmed using principal component analyses (PCA)-wild type, cultivated type, and admixed wild type. However, unweighted pair group method with arithmetic mean (UPGMA) trees indicated the accessions should be grouped into more clusters. Further analyses identified four groups, the Pure Wild Type, Admixed Wild Type, ancient landraces and modern landraces using STRUCTURE, and the results were confirmed by PCA and UPGMA tree method. A higher level of genetic diversity was detected in ancient landraces and Admixed Wild Type than that in the Pure Wild Type and modern landraces. The highest differentiation was between the Pure Wild Type and modern landraces. A relatively fast LD decay with a short range (kb) was observed, and the LD decays of four inferred populations were different. CONCLUSIONS This study is, to our knowledge, the first population genetic analysis of tea germplasm from the Origin Center, Guizhou Plateau, using GBS. The LD pattern, population structure and genetic differentiation of the tea population revealed by our study will benefit further genetic studies, germplasm protection, and breeding.
Collapse
Affiliation(s)
- Suzhen Niu
- The Key Laboratory of Plant Resources Conservation and Germplasm Innovationin Mountainous Region (Ministry of Education), Institute of Agro-Bioengineering / College of Tea Science, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
- Vegetable and Fruit Improvement Center, Department of Horticultural Sciences, Molecular and Environmental Plant Sciences Program, MS2133 Texas A&M University, College Station, TX 77843-2133 USA
- Institute of Tea, Guizhou Academy of Agricultural Sciences, Guiyang, 550006 Guizhou Province People’s Republic of China
| | - Qinfei Song
- The Key Laboratory of Plant Resources Conservation and Germplasm Innovationin Mountainous Region (Ministry of Education), Institute of Agro-Bioengineering / College of Tea Science, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
| | - Hisashi Koiwa
- Vegetable and Fruit Improvement Center, Department of Horticultural Sciences, Molecular and Environmental Plant Sciences Program, MS2133 Texas A&M University, College Station, TX 77843-2133 USA
| | - Dahe Qiao
- Institute of Tea, Guizhou Academy of Agricultural Sciences, Guiyang, 550006 Guizhou Province People’s Republic of China
| | - Degang Zhao
- The Key Laboratory of Plant Resources Conservation and Germplasm Innovationin Mountainous Region (Ministry of Education), Institute of Agro-Bioengineering / College of Tea Science, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
- Institute of Tea, Guizhou Academy of Agricultural Sciences, Guiyang, 550006 Guizhou Province People’s Republic of China
| | - Zhengwu Chen
- Institute of Tea, Guizhou Academy of Agricultural Sciences, Guiyang, 550006 Guizhou Province People’s Republic of China
| | - Xia Liu
- The Key Laboratory of Plant Resources Conservation and Germplasm Innovationin Mountainous Region (Ministry of Education), Institute of Agro-Bioengineering / College of Tea Science, Guizhou University, Guiyang, 550025 Guizhou Province People’s Republic of China
| | - Xiaopeng Wen
- Institute of Agro-bioengineering/College of Life Science, Guizhou University, Huaxi Avenue, Guiyang, 550025 Guizhou Province People’s Republic of China
- Key Laboratory of Plant Resources Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), Guizhou University, Xiahui Road, Huaxi, Guiyang, 550025 Guizhou Province People’s Republic of China
| |
Collapse
|
12
|
Abstract
Daphnia normally reproduce by cyclical parthenogenesis, with offspring sex being determined by environmental cues. However, some females have lost the ability to produce males. Our results demonstrate that this loss of male-producing ability is controlled by a dominant allele at a single locus. We identified the locus by comparing whole-genome sequences of 67 nonmale-producing (NMP) and 100 male-producing (MP) clones from 5 Daphnia pulex populations, revealing 132 NMP-linked SNPs and 59 NMP-linked indels within a single 1.1-Mb nonrecombining region on chromosome I. These markers include 7 nonsynonymous mutations, all of which are located within one unannotated protein-coding gene (gene 8960). Within this single gene, all of the marker-linked NMP haplotypes from different populations form a monophyletic clade, suggesting a single origin of the NMP phenotype, with the NMP haplotype originating by introgression from a sister species, Daphnia pulicaria Methyl farnesoate (MF) is the innate juvenile hormone in daphnids, which induces the production of males and whose inhibition results in female-only production. Gene 8960 is sensitive to treatment by MF in MP clones, but such responsiveness is greatly reduced in NMP clones. Thus, we hypothesize that gene 8960 is located downstream of the MF-signaling pathway in D. pulex, with the NMP phenotype being caused by expression change of gene 8960.
Collapse
|
13
|
Fox EA, Wright AE, Fumagalli M, Vieira FG. ngsLD: evaluating linkage disequilibrium using genotype likelihoods. Bioinformatics 2019; 35:3855-3856. [DOI: 10.1093/bioinformatics/btz200] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 12/27/2018] [Accepted: 03/20/2019] [Indexed: 12/21/2022] Open
Abstract
Abstract
Motivation
Linkage disequilibrium (LD) measures the correlation between genetic loci and is highly informative for association mapping and population genetics. As many studies rely on called genotypes for estimating LD, their results can be affected by data uncertainty, especially when employing a low read depth sequencing strategy. Furthermore, there is a manifest lack of tools for the analysis of large-scale, low-depth and short-read sequencing data from non-model organisms with limited sample sizes.
Results
ngsLD addresses these issues by estimating LD directly from genotype likelihoods in a fast, reliable and user-friendly implementation. This method makes use of the full information available from sequencing data and provides accurate estimates of linkage disequilibrium patterns compared with approaches based on genotype calling. We conducted a case study to investigate how LD decays over physical distance in two avian species.
Availability and implementation
The methods presented in this work were implemented in C/C and are freely available for non-commercial use from https://github.com/fgvieira/ngsLD.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Emma A Fox
- Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, UK
| | - Alison E Wright
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, UK
| | - Filipe G Vieira
- Center for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
14
|
Dou Y, Peng P, Cai C, Ye A, Kong L, Zhang R. HLA-B*58:01 and rs9263726 have a linkage, but not absolute linkage disequilibrium in Han Chinese population. Drug Metab Pharmacokinet 2018; 33:228-231. [PMID: 30193812 DOI: 10.1016/j.dmpk.2018.08.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 07/30/2018] [Accepted: 08/02/2018] [Indexed: 12/17/2022]
Abstract
HLA-B*58:01 has been demonstrated to be associated with allopurinol-induced severe cutaneous adverse reactions. Since HLA-B*58:01 is too complicated to be identified, it is necessary to select an appropriate surrogate biomarker. In Japan, the rs9263726 allele was considered as a surrogate biomarker for HLA-B*58:01, but this was not the case with the Australian cohort. Due to the conflict results, in this study, we aim to demonstrate whether the rs9263726 allele is a surrogate biomarker for HLA-B*58:01 in Han Chinese population. A total of 353 samples (200 cases from the south and 153 cases from the north) were selected to detect HLA-B*58:01 and rs9263726 allele. The HLA-B*58:01 was identified by sequencing-based method, and the rs9263726 allele was identified by Taqman SNP Genotyping Assays. The results showed that the two alleles had a linkage, but not absolute linkage disequilibrium in Han Chinese population.
Collapse
Affiliation(s)
- Yaling Dou
- Department of Clinical Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China.
| | - Pan Peng
- Wuhan YZY Medical Science and Technology Co Ltd., Wuhan, China
| | - Congli Cai
- Wuhan YZY Medical Science and Technology Co Ltd., Wuhan, China
| | - Ali Ye
- Department of Clinical Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Lingjun Kong
- Department of Clinical Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Rui Zhang
- Department of Clinical Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China
| |
Collapse
|
15
|
Linkage Disequilibrium Estimation in Low Coverage High-Throughput Sequencing Data. Genetics 2018; 209:389-400. [PMID: 29588288 PMCID: PMC5972415 DOI: 10.1534/genetics.118.300831] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 03/22/2018] [Indexed: 12/31/2022] Open
Abstract
High-throughput sequencing methods provide a cost-effective approach for genotyping and are commonly used in population genetics studies. A drawback of these methods, however, is that sequencing and genotyping errors can arise... High-throughput sequencing methods that multiplex a large number of individuals have provided a cost-effective approach for discovering genome-wide genetic variation in large populations. These sequencing methods are increasingly being utilized in population genetic studies across a diverse range of species. Two side-effects of these methods, however, are (1) sequencing errors and (2) heterozygous genotypes called as homozygous due to only one allele at a particular locus being sequenced, which occurs when the sequencing depth is insufficient. Both of these errors have a profound effect on the estimation of linkage disequilibrium (LD) and, if not taken into account, lead to inaccurate estimates. We developed a new likelihood method, GUS-LD, to estimate pairwise linkage disequilibrium using low coverage sequencing data that accounts for undercalled heterozygous genotypes and sequencing errors. Our findings show that accurate estimates were obtained using GUS-LD, whereas underestimation of LD results if no adjustment is made for the errors.
Collapse
|
16
|
Abstract
Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As the statistical uncertainties associated with sequencing data depend on depths of coverage, we have developed two types of genotype callers. One approach is appropriate for low-coverage sequencing data, and incorporates population-level information on genotype frequencies and error rates pre-estimated by an ML method. Performance evaluation using computer simulations and human data shows that the proposed framework yields less biased estimates of allele frequencies and more accurate genotype calls than current widely used methods. Another type of genotype caller applies to high-coverage sequencing data, requires no prior genotype-frequency estimates, and makes no assumption on the number of alleles at a polymorphic site. Using computer simulations, we determine the depth of coverage necessary to accurately characterize polymorphisms using this second method. We applied the proposed method to high-coverage (mean 18×) sequencing data of 83 clones from a population of Daphnia pulex. The results show that the proposed method enables conservative and reasonably powerful detection of polymorphisms with arbitrary numbers of alleles. We have extended the proposed method to the analysis of genomic data for polyploid organisms, showing that calling accurate polyploid genotypes requires much higher coverage than diploid genotypes.
Collapse
|
17
|
Abstract
Using data from 83 isolates from a single population, the population genomics of the microcrustacean Daphnia pulex are described and compared to current knowledge for the only other well-studied invertebrate, Drosophila melanogaster These two species are quite similar with respect to effective population sizes and mutation rates, although some features of recombination appear to be different, with linkage disequilibrium being elevated at short ([Formula: see text] bp) distances in D. melanogaster and at long distances in D. pulex The study population adheres closely to the expectations under Hardy-Weinberg equilibrium, and reflects a past population history of no more than a twofold range of variation in effective population size. Fourfold redundant silent sites and a restricted region of intronic sites appear to evolve in a nearly neutral fashion, providing a powerful tool for population genetic analyses. Amino acid replacement sites are predominantly under strong purifying selection, as are a large fraction of sites in UTRs and intergenic regions, but the majority of SNPs at such sites that rise to frequencies [Formula: see text] appear to evolve in a nearly neutral fashion. All forms of genomic sites (including replacement sites within codons, and intergenic and UTR regions) appear to be experiencing an [Formula: see text] higher level of selection scaled to the power of drift in D. melanogaster, but this may in part be a consequence of recent demographic changes. These results establish D. pulex as an excellent system for future work on the evolutionary genomics of natural populations.
Collapse
|
18
|
Therkildsen NO, Palumbi SR. Practical low-coverage genomewide sequencing of hundreds of individually barcoded samples for population and evolutionary genomics in nonmodel species. Mol Ecol Resour 2016; 17:194-208. [DOI: 10.1111/1755-0998.12593] [Citation(s) in RCA: 74] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Revised: 06/27/2016] [Accepted: 07/02/2016] [Indexed: 01/04/2023]
Affiliation(s)
- Nina Overgaard Therkildsen
- Hopkins Marine Station; Department of Biology; Stanford University; 120 Oceanview Blvd. Pacific Grove CA 93950 USA
| | - Stephen R. Palumbi
- Hopkins Marine Station; Department of Biology; Stanford University; 120 Oceanview Blvd. Pacific Grove CA 93950 USA
| |
Collapse
|
19
|
Boitard S, Rodríguez W, Jay F, Mona S, Austerlitz F. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach. PLoS Genet 2016; 12:e1005877. [PMID: 26943927 PMCID: PMC4778914 DOI: 10.1371/journal.pgen.1005877] [Citation(s) in RCA: 102] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 01/27/2016] [Indexed: 12/02/2022] Open
Abstract
Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. Molecular data sampled from extant individuals contains considerable information about their demographic history. In particular, one classical question in population genetics is to reconstruct past population size changes from such data. Relating these changes to various climatic, geological or anthropogenic events allows characterizing the main factors driving genetic diversity and can have major outcomes for conservation. Until recently, mostly very simple histories, including one or two population size changes, could be estimated from genetic data. This has changed with the sequencing of entire genomes in many species, and several methods allow now inferring complex histories consisting of several tens of population size changes. However, analyzing entire genomes, while accounting for recombination, remains a statistical and numerical challenge. These methods, therefore, can only be applied to small samples with a few diploid genomes. We overcome this limitation by using an approximate estimation approach, where observed genomes are summarized using a small number of statistics related to allele frequencies and linkage disequilibrium. In contrast to previous approaches, we show that our method allows us to reconstruct also the most recent part (the last 100 generations) of the population size history. As an illustration, we apply it to large samples of whole-genome sequences in four cattle breeds.
Collapse
Affiliation(s)
- Simon Boitard
- Institut de Systématique, Évolution, Biodiversité ISYEB - UMR 7205 - CNRS & MNHN & UPMC & EPHE, Ecole Pratique des Hautes Etudes, Sorbonne Universités, Paris, France
- GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
- * E-mail:
| | - Willy Rodríguez
- UMR CNRS 5219, Institut de Mathématiques de Toulouse, Université de Toulouse, Toulouse, France
| | - Flora Jay
- UMR 7206 Eco-anthropologie et Ethnobiologie, Muséum National d’Histoire Naturelle, CNRS, Université Paris Diderot, Paris, France
- LRI, Paris-Sud University, CNRS UMR 8623, Orsay, France
| | - Stefano Mona
- Institut de Systématique, Évolution, Biodiversité ISYEB - UMR 7205 - CNRS & MNHN & UPMC & EPHE, Ecole Pratique des Hautes Etudes, Sorbonne Universités, Paris, France
| | - Frédéric Austerlitz
- UMR 7206 Eco-anthropologie et Ethnobiologie, Muséum National d’Histoire Naturelle, CNRS, Université Paris Diderot, Paris, France
| |
Collapse
|
20
|
Abstract
Rapidly improving high-throughput sequencing technologies provide unprecedented opportunities for carrying out population-genomic studies with various organisms. To take full advantage of these methods, it is essential to correctly estimate allele and genotype frequencies, and here we present a maximum-likelihood method that accomplishes these tasks. The proposed method fully accounts for uncertainties resulting from sequencing errors and biparental chromosome sampling and yields essentially unbiased estimates with minimal sampling variances with moderately high depths of coverage regardless of a mating system and structure of the population. Moreover, we have developed statistical tests for examining the significance of polymorphisms and their genotypic deviations from Hardy-Weinberg equilibrium. We examine the performance of the proposed method by computer simulations and apply it to low-coverage human data generated by high-throughput sequencing. The results show that the proposed method improves our ability to carry out population-genomic analyses in important ways. The software package of the proposed method is freely available from https://github.com/Takahiro-Maruki/Package-GFE.
Collapse
|
21
|
Abstract
Although the analysis of linkage disequilibrium (LD) plays a central role in many areas of population genetics, the sampling variance of LD is known to be very large with high sensitivity to numbers of nucleotide sites and individuals sampled. Here we show that a genome-wide analysis of the distribution of heterozygous sites within a single diploid genome can yield highly informative patterns of LD as a function of physical distance. The proposed statistic, the correlation of zygosity, is closely related to the conventional population-level measure of LD, but is agnostic with respect to allele frequencies and hence likely less prone to outlier artifacts. Application of the method to several vertebrate species leads to the conclusion that >80% of recombination events are typically resolved by gene-conversion-like processes unaccompanied by crossovers, with the average lengths of conversion patches being on the order of one to several kilobases in length. Thus, contrary to common assumptions, the recombination rate between sites does not scale linearly with distance, often even up to distances of 100 kb. In addition, the amount of LD between sites separated by <200 bp is uniformly much greater than can be explained by the conventional neutral model, possibly because of the nonindependent origin of mutations within this spatial scale. These results raise questions about the application of conventional population-genetic interpretations to LD on short spatial scales and also about the use of spatial patterns of LD to infer demographic histories.
Collapse
|