1
|
The Maintenance of Deleterious Variation in Wild Chinese Rhesus Macaques. Genome Biol Evol 2024:evae115. [PMID: 38795368 DOI: 10.1093/gbe/evae115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 04/25/2024] [Accepted: 05/22/2024] [Indexed: 05/27/2024] Open
Abstract
Understanding how deleterious variation is shaped and maintained in natural populations is important in conservation and evolutionary biology, as decreased fitness caused by these deleterious mutations can potentially lead to an increase in extinction risk. It is known that demographic processes can influence these patterns. For example, population bottlenecks and inbreeding increase the probability of inheriting identical-by-descent haplotypes from a recent common ancestor, creating long tracts of homozygous genotypes called runs of homozygosity (ROH), which have been associated with an accumulation of mildly deleterious homozygotes. Counter intuitively, positive selection can also maintain deleterious variants in a population through genetic hitchhiking. Here we analyze the whole genomes of 79 wild Chinese rhesus macaques across five sub species and characterize patterns of deleterious variation with respect to ROH and signals of recent positive selection. We show that the fraction of homozygotes occurring in long ROH is significantly higher for deleterious homozygotes than tolerated ones, whereas this trend is not observed for short and medium ROH. This confirms that inbreeding, by generating these long tracts of homozygosity, is the main driver of the high burden of homozygous deleterious alleles in wild macaque populations. Furthermore, we show evidence that homozygous LOF variants are being purged. Next, we identify 7 deleterious variants at high frequency in regions putatively under selection near genes involved with olfaction and other processes. Our results shed light on how evolutionary processes can shape the distribution of deleterious variation in wild non-human primates.
Collapse
|
2
|
Genotypic and phenotypic consequences of domestication in dogs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.01.592072. [PMID: 38746159 PMCID: PMC11092585 DOI: 10.1101/2024.05.01.592072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Runs of homozygosity (ROH) are genomic regions that arise when two copies of an identical ancestral haplotype are inherited from parents with a recent common ancestor. In this study, we performed a novel comprehensive analysis to infer genetic diversity among dogs and quantified the association between ROH and non-disease phenotypes. We found distinct patterns of genetic diversity across clades of breed dogs and elevated levels of long ROH, compared to non- domesticated dogs. These high levels of F ROH (inbreeding coefficient) are a consequence of recent inbreeding among domesticated dogs during breed establishment. We identified statistically significant associations between F ROH and height, weight, lifespan, muscled, white head, white chest, furnish, and length of fur. After correcting for population structure, we identified more than 45 genes across the three examined quantitative traits that exceeded the threshold for suggestive significance, indicating significant polygenic inheritance for the complex quantitative phenotypes in dogs.
Collapse
|
3
|
The Maintenance of Deleterious Variation in Wild Chinese Rhesus Macaques. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.04.560901. [PMID: 38712222 PMCID: PMC11071285 DOI: 10.1101/2023.10.04.560901] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Understanding how deleterious variation is shaped and maintained in natural populations is important in conservation and evolutionary biology, as decreased fitness caused by these deleterious mutations can potentially lead to an increase in extinction risk. It is known that demographic processes can influence these patterns. For example, population bottlenecks and inbreeding increase the probability of inheriting identical-by-descent haplotypes from a recent common ancestor, creating long tracts of homozygous genotypes called runs of homozygosity (ROH), which have been associated with an accumulation of mildly deleterious homozygotes. Counter intuitively, positive selection can also maintain deleterious variants in a population through genetic hitchhiking. Here we analyze the whole genomes of 79 wild Chinese rhesus macaques across five subspecies and characterize patterns of deleterious variation with respect to ROH and signals of recent positive selection. We show that the fraction of homozygotes occurring in long ROH is significantly higher for deleterious homozygotes than tolerated ones, whereas this trend is not observed for short and medium ROH. This confirms that inbreeding, by generating these long tracts of homozygosity, is the main driver of the high burden of homozygous deleterious alleles in wild macaque populations. Furthermore, we show evidence that homozygous LOF variants are being purged. Next, we identify 7 deleterious variants at high frequency in regions putatively under selection near genes involved with olfaction and other processes. Our results shed light on how evolutionary processes can shape the distribution of deleterious variation in wild non-human primates.
Collapse
|
4
|
Genomic consequences of isolation and inbreeding in an island dingo population. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.15.557950. [PMID: 37745583 PMCID: PMC10516007 DOI: 10.1101/2023.09.15.557950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Dingoes come from an ancient canid lineage that originated in East Asia around 8000-11,000 years BP. As Australia's largest terrestrial predator, dingoes play an important ecological role. A small, protected population exists on a world heritage listed offshore island, K'gari (formerly Fraser Island). Concern regarding the persistence of dingoes on K'gari has risen due to their low genetic diversity and elevated inbreeding levels. However, whole-genome sequencing data is lacking from this population. Here, we include five new whole-genome sequences of K'gari dingoes. We analyze a total of 18 whole genome sequences of dingoes sampled from mainland Australia and K'gari to assess the genomic consequences of their demographic histories. Long (>1 Mb) runs of homozygosity (ROH) - indicators of inbreeding - are elevated in all sampled dingoes. However, K'gari dingoes showed significantly higher levels of very long ROH (>5 Mb), providing genomic evidence for small population size, isolation, inbreeding, and a strong founder effect. Our results suggest that, despite current levels of inbreeding, the K'gari population is purging strongly deleterious mutations, which, in the absence of further reductions in population size, may facilitate the persistence of small populations despite low genetic diversity and isolation. However, there may be little to no purging of mildly deleterious alleles, which may have important long-term consequences, and should be considered by conservation and management programs. SIGNIFICANCE A long-standing question in conservation genetics is whether long-term isolation and elevated levels of inbreeding always leads to inevitable population extinction. Here we conduct the first-ever whole-genome analysis of a population of dingoes living in long-term isolation on an island off the coast of Australia (K'gari). We show that these animals are beset by very low genetic diversity, likely the result of extensive inbreeding, and an elevated number of deleterious homozygotes. However, our results suggest that these dingoes are likely purging highly deleterious alleles, which may have allowed them to persist long term despite their extremely small population (<200 individuals).
Collapse
|
5
|
selscan 2.0: scanning for sweeps in unphased data. Bioinformatics 2024; 40:btae006. [PMID: 38180866 PMCID: PMC10789311 DOI: 10.1093/bioinformatics/btae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 12/26/2023] [Accepted: 01/03/2024] [Indexed: 01/07/2024] Open
Abstract
SUMMARY Several popular haplotype-based statistics for identifying recent or ongoing positive selection in genomes require knowledge of haplotype phase. Here, we provide an update to selscan which implements a re-definition of these statistics for use in unphased data. AVAILABILITY AND IMPLEMENTATION Source code and binaries are freely available at https://github.com/szpiech/selscan, implemented in C/C++, and supported on Linux, Windows, and MacOS.
Collapse
|
6
|
The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
|
7
|
Genomes of the extinct Bachman's warbler show high divergence and no evidence of admixture with other extant Vermivora warblers. Curr Biol 2023:S0960-9822(23)00690-5. [PMID: 37329885 DOI: 10.1016/j.cub.2023.05.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 04/25/2023] [Accepted: 05/25/2023] [Indexed: 06/19/2023]
Abstract
Bachman's warbler1 (Vermivora bachmanii)-last sighted in 1988-is one of the only North American passerines to recently go extinct.2,3,4 Given extensive ongoing hybridization of its two extant congeners-the blue-winged warbler (V. cyanoptera) and golden-winged warbler (V. chrysoptera)5,6,7,8-and shared patterns of plumage variation between Bachman's warbler and hybrids between those extant species, it has been suggested that Bachman's warbler might have also had a component of hybrid ancestry. Here, we use historic DNA (hDNA) and whole genomes of Bachman's warblers collected at the turn of the 20th century to address this. We combine these data with the two extant Vermivora species to examine patterns of population differentiation, inbreeding, and gene flow. In contrast to the admixture hypothesis, the genomic evidence is consistent with V. bachmanii having been a highly divergent, reproductively isolated species, with no evidence of introgression. We show that these three species have similar levels of runs of homozygosity (ROH), consistent with effects of a small long-term effective population size or population bottlenecks, with one V. bachmanii outlier showing numerous long ROH and a FROH greater than 5%. We also found-using population branch statistic estimates-previously undocumented evidence of lineage-specific evolution in V. chrysoptera near a pigmentation gene candidate, CORIN, which is a known modifier of ASIP, which is in turn involved in melanic throat and mask coloration in this family of birds. Together, these genomic results also highlight how natural history collections are such invaluable repositories of information about extant and extinct species.
Collapse
|
8
|
A rarefaction approach for measuring population differences in rare and common variation. Genetics 2023; 224:iyad070. [PMID: 37075098 PMCID: PMC10213490 DOI: 10.1093/genetics/iyad070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 12/20/2022] [Accepted: 04/07/2023] [Indexed: 04/20/2023] Open
Abstract
In studying allele-frequency variation across populations, it is often convenient to classify an allelic type as "rare," with nonzero frequency less than or equal to a specified threshold, "common," with a frequency above the threshold, or entirely unobserved in a population. When sample sizes differ across populations, however, especially if the threshold separating "rare" and "common" corresponds to a small number of observed copies of an allelic type, discreteness effects can lead a sample from one population to possess substantially more rare allelic types than a sample from another population, even if the two populations have extremely similar underlying allele-frequency distributions across loci. We introduce a rarefaction-based sample-size correction for use in comparing rare and common variation across multiple populations whose sample sizes potentially differ. We use our approach to examine rare and common variation in worldwide human populations, finding that the sample-size correction introduces subtle differences relative to analyses that use the full available sample sizes. We introduce several ways in which the rarefaction approach can be applied: we explore the dependence of allele classifications on subsample sizes, we permit more than two classes of allelic types of nonzero frequency, and we analyze rare and common variation in sliding windows along the genome. The results can assist in clarifying similarities and differences in allele-frequency patterns across populations.
Collapse
|
9
|
Evolution of genes involved in the unusual genitals of the bear macaque, Macaca arctoides. Ecol Evol 2022; 12:e8897. [PMID: 35646310 PMCID: PMC9130562 DOI: 10.1002/ece3.8897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 04/05/2022] [Indexed: 11/30/2022] Open
Abstract
Genital divergence is thought to contribute to reproductive barriers by establishing a “lock‐and‐key" mechanism for reproductive compatibility. One such example, Macaca arctoides, the bear macaque, has compensatory changes in both male and female genital morphology as compared to close relatives. M. arctoides also has a complex evolutionary history, having extensive introgression between the fascicularis and sinica macaque species groups. Here, phylogenetic relationships were analyzed via whole‐genome sequences from five species, including M. arctoides, and two species each from the putative parental species groups. This analysis revealed ~3x more genomic regions supported placement in the sinica species group as compared to the fascicularis species group. Additionally, introgression analysis of the M. arctoides genome revealed it is a mosaic of recent polymorphisms shared with both species groups. To examine the evolution of their unique genital morphology further, the prevalence of candidate genes involved in genital morphology was compared against genome‐wide outliers in various population genetic metrics of diversity, divergence, introgression, and selection, while accounting for background variation in recombination rate. This analysis identified 67 outlier genes, including several genes that influence baculum morphology in mice, which were of interest since the bear macaque has the longest primate baculum. The mean of four of the seven population genetic metrics was statistically different in the candidate genes as compared to the rest of the genome, suggesting that genes involved in genital morphology have increased divergence and decreased diversity beyond expectations. These results highlight specific genes that may have played a role in shaping the unique genital morphology in the bear macaque.
Collapse
|
10
|
A spatially aware likelihood test to detect sweeps from haplotype distributions. PLoS Genet 2022; 18:e1010134. [PMID: 35404934 PMCID: PMC9022890 DOI: 10.1371/journal.pgen.1010134] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 04/21/2022] [Accepted: 03/04/2022] [Indexed: 01/13/2023] Open
Abstract
The inference of positive selection in genomes is a problem of great interest in evolutionary genomics. By identifying putative regions of the genome that contain adaptive mutations, we are able to learn about the biology of organisms and their evolutionary history. Here we introduce a composite likelihood method that identifies recently completed or ongoing positive selection by searching for extreme distortions in the spatial distribution of the haplotype frequency spectrum along the genome relative to the genome-wide expectation taken as neutrality. Furthermore, the method simultaneously infers two parameters of the sweep: the number of sweeping haplotypes and the “width” of the sweep, which is related to the strength and timing of selection. We demonstrate that this method outperforms the leading haplotype-based selection statistics, though strong signals in low-recombination regions merit extra scrutiny. As a positive control, we apply it to two well-studied human populations from the 1000 Genomes Project and examine haplotype frequency spectrum patterns at the LCT and MHC loci. We also apply it to a data set of brown rats sampled in NYC and identify genes related to olfactory perception. To facilitate use of this method, we have implemented it in user-friendly open source software. Identifying regions of the genome that contain adaptive variation is of fundamental interest in evolutionary biology, providing insight into an organism’s history and biology. When positive selection is recent or ongoing, we expect to find genomic patterns such as high frequency haplotypes and low genetic diversity in the vicinity of the adaptive locus. Here we develop a statistic to identify these regions based on distortions of the haplotype frequency spectrum from a background distribution. We evaluate the performance of this statistic under numerous realistic settings of interest to empiricists and demonstrate its superior performance relative to other haplotype-based selection statistics. We also apply this statistic to real population-genetic data. As a positive control, we explore two well-studied loci, LCT and MHC, in a European and an African human population that show strong evidence for selection. We also apply this statistic to the genomes of an urban brown rat population, where we uncover evidence for adaptation in olfactory perception genes. We release user-friendly software implementing this statistic.
Collapse
|
11
|
Application of a novel haplotype-based scan for local adaptation to study high-altitude adaptation in rhesus macaques. Evol Lett 2021; 5:408-421. [PMID: 34367665 PMCID: PMC8327953 DOI: 10.1002/evl3.232] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 02/24/2021] [Accepted: 05/04/2021] [Indexed: 12/17/2022] Open
Abstract
When natural populations split and migrate to different environments, they may experience different selection pressures that can lead to local adaptation. To capture the genomic patterns of a local selective sweep, we develop XP-nSL, a genomic scan for local adaptation that compares haplotype patterns between two populations. We show that XP-nSL has power to detect ongoing and recently completed hard and soft sweeps, and we then apply this statistic to search for evidence of adaptation to high altitude in rhesus macaques. We analyze the whole genomes of 23 wild rhesus macaques captured at high altitude (mean altitude > 4000 m above sea level) to 22 wild rhesus macaques captured at low altitude (mean altitude < 500 m above sea level) and find evidence of local adaptation in the high-altitude population at or near 303 known genes and several unannotated regions. We find the strongest signal for adaptation at EGLN1, a classic target for convergent evolution in several species living in low oxygen environments. Furthermore, many of the 303 genes are involved in processes related to hypoxia, regulation of ROS, DNA damage repair, synaptic signaling, and metabolism. These results suggest that, beyond adapting via a beneficial mutation in one single gene, adaptation to high altitude in rhesus macaques is polygenic and spread across numerous important biological systems.
Collapse
|
12
|
Abstract
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
Collapse
|
13
|
Ancestry-Dependent Enrichment of Deleterious Homozygotes in Runs of Homozygosity. Am J Hum Genet 2019; 105:747-762. [PMID: 31543216 PMCID: PMC6817522 DOI: 10.1016/j.ajhg.2019.08.011] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 08/27/2019] [Indexed: 12/20/2022] Open
Abstract
Runs of homozygosity (ROH) are important genomic features that manifest when an individual inherits two haplotypes that are identical by descent. Their length distributions are informative about population history, and their genomic locations are useful for mapping recessive loci contributing to both Mendelian and complex disease risk. We have previously shown that ROH, and especially long ROH that are likely the result of recent parental relatedness, are enriched for homozygous deleterious coding variation in a worldwide sample of outbred individuals. However, the distribution of ROH in admixed populations and their relationship to deleterious homozygous genotypes is understudied. Here we analyze whole-genome sequencing data from 1,441 unrelated individuals from self-identified African American, Puerto Rican, and Mexican American populations. These populations are three-way admixed between European, African, and Native American ancestries and provide an opportunity to study the distribution of deleterious alleles partitioned by local ancestry and ROH. We re-capitulate previous findings that long ROH are enriched for deleterious variation genome-wide. We then partition by local ancestry and show that deleterious homozygotes arise at a higher rate when ROH overlap African ancestry segments than when they overlap European or Native American ancestry segments of the genome. These results suggest that, while ROH on any haplotype background are associated with an inflation of deleterious homozygous variation, African haplotype backgrounds may play a particularly important role in the genetic architecture of complex diseases for admixed individuals, highlighting the need for further study of these populations.
Collapse
|
14
|
Correction: Human demographic history has amplified the effects of background selection across the genome. PLoS Genet 2019; 15:e1007898. [PMID: 30601801 PMCID: PMC6314599 DOI: 10.1371/journal.pgen.1007898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
15
|
Human demographic history has amplified the effects of background selection across the genome. PLoS Genet 2018; 14:e1007387. [PMID: 29912945 PMCID: PMC6056204 DOI: 10.1371/journal.pgen.1007387] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Revised: 07/23/2018] [Accepted: 04/30/2018] [Indexed: 01/22/2023] Open
Abstract
Natural populations often grow, shrink, and migrate over time. Such demographic processes can affect genome-wide levels of genetic diversity. Additionally, genetic variation in functional regions of the genome can be altered by natural selection, which drives adaptive mutations to higher frequencies or purges deleterious ones. Such selective processes affect not only the sites directly under selection but also nearby neutral variation through genetic linkage via processes referred to as genetic hitchhiking in the context of positive selection and background selection (BGS) in the context of purifying selection. While there is extensive literature examining the consequences of selection at linked sites at demographic equilibrium, less is known about how non-equilibrium demographic processes influence the effects of hitchhiking and BGS. Utilizing a global sample of human whole-genome sequences from the Thousand Genomes Project and extensive simulations, we investigate how non-equilibrium demographic processes magnify and dampen the consequences of selection at linked sites across the human genome. When binning the genome by inferred strength of BGS, we observe that, compared to Africans, non-African populations have experienced larger proportional decreases in neutral genetic diversity in strong BGS regions. We replicate these findings in admixed populations by showing that non-African ancestral components of the genome have also been affected more severely in these regions. We attribute these differences to the strong, sustained/recurrent population bottlenecks that non-Africans experienced as they migrated out of Africa and throughout the globe. Furthermore, we observe a strong correlation between FST and the inferred strength of BGS, suggesting a stronger rate of genetic drift. Forward simulations of human demographic history with a model of BGS support these observations. Our results show that non-equilibrium demography significantly alters the consequences of selection at linked sites and support the need for more work investigating the dynamic process of multiple evolutionary forces operating in concert.
Collapse
|
16
|
Whole-Genome Sequencing of Pharmacogenetic Drug Response in Racially Diverse Children with Asthma. Am J Respir Crit Care Med 2018; 197:1552-1564. [PMID: 29509491 PMCID: PMC6006403 DOI: 10.1164/rccm.201712-2529oc] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 03/05/2018] [Indexed: 12/25/2022] Open
Abstract
RATIONALE Albuterol, a bronchodilator medication, is the first-line therapy for asthma worldwide. There are significant racial/ethnic differences in albuterol drug response. OBJECTIVES To identify genetic variants important for bronchodilator drug response (BDR) in racially diverse children. METHODS We performed the first whole-genome sequencing pharmacogenetics study from 1,441 children with asthma from the tails of the BDR distribution to identify genetic association with BDR. MEASUREMENTS AND MAIN RESULTS We identified population-specific and shared genetic variants associated with BDR, including genome-wide significant (P < 3.53 × 10-7) and suggestive (P < 7.06 × 10-6) loci near genes previously associated with lung capacity (DNAH5), immunity (NFKB1 and PLCB1), and β-adrenergic signaling (ADAMTS3 and COX18). Functional analyses of the BDR-associated SNP in NFKB1 revealed potential regulatory function in bronchial smooth muscle cells. The SNP is also an expression quantitative trait locus for a neighboring gene, SLC39A8. The lack of other asthma study populations with BDR and whole-genome sequencing data on minority children makes it impossible to perform replication of our rare variant associations. Minority underrepresentation also poses significant challenges to identify age-matched and population-matched cohorts of sufficient sample size for replication of our common variant findings. CONCLUSIONS The lack of minority data, despite a collaboration of eight universities and 13 individual laboratories, highlights the urgent need for a dedicated national effort to prioritize diversity in research. Our study expands the understanding of pharmacogenetic analyses in racially/ethnically diverse populations and advances the foundation for precision medicine in at-risk and understudied minority populations.
Collapse
|
17
|
GARLIC: Genomic Autozygosity Regions Likelihood-based Inference and Classification. Bioinformatics 2018; 33:2059-2062. [PMID: 28205676 DOI: 10.1093/bioinformatics/btx102] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Accepted: 02/15/2017] [Indexed: 12/14/2022] Open
Abstract
Summary Runs of homozygosity (ROH) are important genomic features that manifest when identical-by-descent haplotypes are inherited from parents. Their length distributions and genomic locations are informative about population history and they are useful for mapping recessive loci contributing to both Mendelian and complex disease risk. Here, we present software implementing a model-based method ( Pemberton et al., 2012 ) for inferring ROH in genome-wide SNP datasets that incorporates population-specific parameters and a genotyping error rate as well as provides a length-based classification module to identify biologically interesting classes of ROH. Using simulations, we evaluate the performance of this method. Availability and Implementation GARLIC is written in C ++. Source code and pre-compiled binaries (Windows, OSX and Linux) are hosted on GitHub ( https://github.com/szpiech/garlic ) under the GNU General Public License version 3. Contact zachary.szpiech@ucsf.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
18
|
Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data. Bioinformatics 2018; 33:1147-1153. [PMID: 28035032 PMCID: PMC5408850 DOI: 10.1093/bioinformatics/btw786] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Accepted: 12/07/2016] [Indexed: 12/30/2022] Open
Abstract
Motivation Variant calling from next-generation sequencing (NGS) data is susceptible to false positive calls due to sequencing, mapping and other errors. To better distinguish true from false positive calls, we present a method that uses genotype array data from the sequenced samples, rather than public data such as HapMap or dbSNP, to train an accurate classifier using Random Forests. We demonstrate our method on a set of variant calls obtained from 642 African-ancestry genomes from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), sequenced to high depth (30X). Results We have applied our classifier to compare call sets generated with different calling methods, including both single-sample and multi-sample callers. At a False Positive Rate of 5%, our method determines true positive rates of 97.5%, 95% and 99% on variant calls obtained using Illuminas single-sample caller CASAVA, Real Time Genomics multisample variant caller, and the GATK UnifiedGenotyper, respectively. Since NGS sequencing data may be accompanied by genotype data for the same samples, either collected concurrent to sequencing or from a previous study, our method can be trained on each dataset to provide a more accurate computational validation of site calls compared to generic methods. Moreover, our method allows for adjustment based on allele frequency (e.g. a different set of criteria to determine quality for rare versus common variants) and thereby provides insight into sequencing characteristics that indicate call quality for variants of different frequencies. Availability and Implementation Code is available on Github at: https://github.com/suyashss/variant_validation. Contacts suyashs@stanford.edu or mtaub@jhsph.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
19
|
Weighted likelihood inference of genomic autozygosity patterns in dense genotype data. BMC Genomics 2017; 18:928. [PMID: 29191164 PMCID: PMC5709839 DOI: 10.1186/s12864-017-4312-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 11/16/2017] [Indexed: 12/14/2022] Open
Abstract
Background Genomic regions of autozygosity (ROA) arise when an individual is homozygous for haplotypes inherited identical-by-descent from ancestors shared by both parents. Over the past decade, they have gained importance for understanding evolutionary history and the genetic basis of complex diseases and traits. However, methods to infer ROA in dense genotype data have not evolved in step with advances in genome technology that now enable us to rapidly create large high-resolution genotype datasets, limiting our ability to investigate their constituent ROA patterns. Methods We report a weighted likelihood approach for inferring ROA in dense genotype data that accounts for autocorrelation among genotyped positions and the possibilities of unobserved mutation and recombination events, and variability in the confidence of individual genotype calls in whole genome sequence (WGS) data. Results Forward-time genetic simulations under two demographic scenarios that reflect situations where inbreeding and its effect on fitness are of interest suggest this approach is better powered than existing state-of-the-art methods to infer ROA at marker densities consistent with WGS and popular microarray genotyping platforms used in human and non-human studies. Moreover, we present evidence that suggests this approach is able to distinguish ROA arising via consanguinity from ROA arising via endogamy. Using subsets of The 1000 Genomes Project Phase 3 data we show that, relative to WGS, intermediate and long ROA are captured robustly with popular microarray platforms, while detection of short ROA is more variable and improves with marker density. Worldwide ROA patterns inferred from WGS data are found to accord well with those previously reported on the basis of microarray genotype data. Finally, we highlight the potential of this approach to detect genomic regions enriched for autozygosity signals in one group relative to another based upon comparisons of per-individual autozygosity likelihoods instead of inferred ROA frequencies. Conclusions This weighted likelihood ROA inference approach can assist population- and disease-geneticists working with a wide variety of data types and species to explore ROA patterns and to identify genomic regions with differential ROA signals among groups, thereby advancing our understanding of evolutionary history and the role of recessive variation in phenotypic variation and disease. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4312-3) contains supplementary material, which is available to authorized users.
Collapse
|
20
|
Cancer-associated arginine-to-histidine mutations confer a gain in pH sensing to mutant proteins. Sci Signal 2017; 10:10/495/eaam9931. [PMID: 28874603 DOI: 10.1126/scisignal.aam9931] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The intracellular pH (pHi) of most cancers is constitutively higher than that of normal cells and enhances proliferation and cell survival. We found that increased pHi enabled the tumorigenic behaviors caused by somatic arginine-to-histidine mutations, which are frequent in cancer and confer pH sensing not seen with wild-type proteins. Experimentally raising the pHi increased the activity of R776H mutant epidermal growth factor receptor (EGFR-R776H), thereby increasing proliferation and causing transformation in fibroblasts. An Arg-to-Gly mutation did not confer these effects. Molecular dynamics simulations of EGFR suggested that decreased protonation of His776 at high pH causes conformational changes in the αC helix that may stabilize the active form of the kinase. An Arg-to-His, but not Arg-to-Lys, mutation in the transcription factor p53 (p53-R273H) decreased its transcriptional activity and attenuated the DNA damage response in fibroblasts and breast cancer cells with high pHi. Lowering pHi attenuated the tumorigenic effects of both EGFR-R776H and p53-R273H. Our data suggest that some somatic mutations may confer a fitness advantage to the higher pHi of cancer cells.
Collapse
|
21
|
Abstract
Cancer can be viewed as a set of different diseases with distinctions based on tissue origin, driver mutations, and genetic signatures. Accordingly, each of these distinctions have been used to classify cancer subtypes and to reveal common features. Here, we present a different analysis of cancer based on amino acid mutation signatures. Non-negative Matrix Factorization and principal component analysis of 29 cancers revealed six amino acid mutation signatures, including four signatures that were dominated by either arginine to histidine (Arg>His) or glutamate to lysine (Glu>Lys) mutations. Sample-level analyses reveal that while some cancers are heterogeneous, others are largely dominated by one type of mutation. Using a non-overlapping set of samples from the COSMIC somatic mutation database, we validate five of six mutation signatures, including signatures with prominent arginine to histidine (Arg>His) or glutamate to lysine (Glu>Lys) mutations. This suggests that our classification of cancers based on amino acid mutation patterns may provide avenues of inquiry pertaining to specific protein mutations that may generate novel insights into cancer biology.
Collapse
|
22
|
Abstract
Haplotype-based scans to detect natural selection are useful to identify recent or ongoing positive selection in genomes. As both real and simulated genomic data sets grow larger, spanning thousands of samples and millions of markers, there is a need for a fast and efficient implementation of these scans for general use. Here, we present selscan, an efficient multithreaded application that implements Extended Haplotype Homozygosity (EHH), Integrated Haplotype Score (iHS), and Cross-population EHH (XPEHH). selscan accepts phased genotypes in multiple formats, including TPED, and performs extremely well on both simulated and real data and over an order of magnitude faster than existing available implementations. It calculates iHS on chromosome 22 (22,147 loci) across 204 CEU haplotypes in 353 s on one thread (33 s on 16 threads) and calculates XPEHH for the same data relative to 210 YRI haplotypes in 578 s on one thread (52 s on 16 threads). Source code and binaries (Windows, OSX, and Linux) are available at https://github.com/szpiech/selscan.
Collapse
|
23
|
On the size distribution of private microsatellite alleles. Theor Popul Biol 2011; 80:100-13. [PMID: 21514313 DOI: 10.1016/j.tpb.2011.03.006] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Revised: 03/29/2011] [Accepted: 03/30/2011] [Indexed: 10/18/2022]
Abstract
Private microsatellite alleles tend to be found in the tails rather than in the interior of the allele size distribution. To explain this phenomenon, we have investigated the size distribution of private alleles in a coalescent model of two populations, assuming the symmetric stepwise mutation model as the mode of microsatellite mutation. For the case in which four alleles are sampled, two from each population, we condition on the configuration in which three distinct allele sizes are present, one of which is common to both populations, one of which is private to one population, and the third of which is private to the other population. Conditional on this configuration, we calculate the probability that the two private alleles occupy the two tails of the size distribution. This probability, which increases as a function of mutation rate and divergence time between the two populations, is seen to be greater than the value that would be predicted if there was no relationship between privacy and location in the allele size distribution. In accordance with the prediction of the model, we find that in pairs of human populations, the frequency with which private microsatellite alleles occur in the tails of the allele size distribution increases as a function of genetic differentiation between populations.
Collapse
|
24
|
Abstract
Genome-wide association (GWA) studies have identified a large number of SNPs associated with disease phenotypes. As most GWA studies have been performed in populations of European descent, this Review examines the issues involved in extending the consideration of GWA studies to diverse worldwide populations. Although challenges exist with issues such as imputation, admixture and replication, investigation of a greater diversity of populations could make substantial contributions to the goal of mapping the genetic determinants of complex diseases for the human population as a whole.
Collapse
|
25
|
Abstract
Motivation: Analysis of the distribution of alleles across populations is a useful tool for examining population diversity and relationships. However, sample sizes often differ across populations, sometimes making it difficult to assess allelic distributions across groups. Results: We introduce a generalized rarefaction approach for counting alleles private to combinations of populations. Our method evaluates the number of alleles found in each of a set of populations but absent in all remaining populations, considering equal-sized subsamples from each population. Applying this method to a worldwide human microsatellite dataset, we observe a high number of alleles private to the combination of African and Oceanian populations. This result supports the possibility of a migration out of Africa into Oceania separate from the migrations responsible for the majority of the ancestry of the modern populations of Asia, and it highlights the utility of our approach to sample size correction in evaluating hypotheses about population history. Availability: We have implemented our method in the computer pro-gram ADZE, which is available for download at http://rosenberglab.bioinformatics.med.umich.edu/adze.html. Contact:szpiechz@umich.edu
Collapse
|
26
|
Genotype, haplotype and copy-number variation in worldwide human populations. Nature 2008; 451:998-1003. [PMID: 18288195 DOI: 10.1038/nature06742] [Citation(s) in RCA: 611] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2007] [Accepted: 01/29/2008] [Indexed: 11/09/2022]
Abstract
Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups. Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected--including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas--the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.
Collapse
|