1
|
Ciccarella M, Laurent R, Szpiech ZA, Patin E, Dessarps-Freichey F, Utgé J, Lémée L, Semo A, Rocha J, Verdu P. Nested admixture during and after the Trans-Atlantic Slave Trade on the island of São Tomé. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.21.619344. [PMID: 39484499 PMCID: PMC11526973 DOI: 10.1101/2024.10.21.619344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Human admixture history is rarely a simple process in which distinct populations, previously isolated for a long time, come into contact once to form an admixed population. In this study, we aim to reconstruct the complex admixture histories of the population of São Tomé, an island in the Gulf of Guinea that was the site of the first slave-based plantation economy, and experienced successive waves of forced and deliberate migration from Africa. We examined 2.5 million SNPs newly genotyped in 96 São Toméans and found that geography alone cannot explain the observed patterns of genetic differentiation within the island. We defined five genetic groups in São Tomé based on the hypothesis that individuals sharing the most haplotypes are more likely to share similar genetic histories. Using Identical-by-Descent and different local ancestry inference methods, we inferred shared ancestries between 70 African and European populations and each São Toméan genetic group. We identified admixture events between admixed groups that were previously isolated on the island, showing how recently admixed populations can be themselves the sources of other admixture events. This study demonstrates how complex admixture and isolation histories during and after the Transatlantic Slave-Trade shaped extant individual genetic patterns at a local scale in Africa.
Collapse
Affiliation(s)
- Marta Ciccarella
- UMR7206 Eco-anthropologie, CNRS, MNHN, Université Paris Cité, France
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661 Vairão, Portugal
| | - Romain Laurent
- UMR7206 Eco-anthropologie, CNRS, MNHN, Université Paris Cité, France
| | - Zachary A. Szpiech
- Department of Biology, Penn State University, United States
- Institute for Computational and Data Sciences, Penn State University, United States
| | - Etienne Patin
- Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, CNRS UMR2000, Paris, France
| | | | - José Utgé
- UMR7206 Eco-anthropologie, CNRS, MNHN, Université Paris Cité, France
| | - Laure Lémée
- Plateforme Technologique Biomics, C2RT, Institut Pasteur, France
| | - Armando Semo
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661 Vairão, Portugal
| | - Jorge Rocha
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661 Vairão, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, 4099-002 Porto, Portugal
| | - Paul Verdu
- UMR7206 Eco-anthropologie, CNRS, MNHN, Université Paris Cité, France
| |
Collapse
|
2
|
Korunes KL, Soares-Souza GB, Bobrek K, Tang H, Araújo II, Goldberg A, Beleza S. Sex-biased admixture and assortative mating shape genetic variation and influence demographic inference in admixed Cabo Verdeans. G3 (BETHESDA, MD.) 2022; 12:jkac183. [PMID: 35861404 PMCID: PMC9526050 DOI: 10.1093/g3journal/jkac183] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 06/21/2022] [Indexed: 11/22/2022]
Abstract
Genetic data can provide insights into population history, but first, we must understand the patterns that complex histories leave in genomes. Here, we consider the admixed human population of Cabo Verde to understand the patterns of genetic variation left by social and demographic processes. First settled in the late 1400s, Cabo Verdeans are admixed descendants of Portuguese colonizers and enslaved West African people. We consider Cabo Verde's well-studied historical record alongside genome-wide SNP data from 563 individuals from 4 regions within the archipelago. We use genetic ancestry to test for patterns of nonrandom mating and sex-specific gene flow, and we examine the consequences of these processes for common demographic inference methods and genetic patterns. Notably, multiple population genetic tools that assume random mating underestimate the timing of admixture, but incorporating nonrandom mating produces estimates more consistent with historical records. We consider how admixture interrupts common summaries of genomic variation such as runs of homozygosity. While summaries of runs of homozygosity may be difficult to interpret in admixed populations, differentiating runs of homozygosity by length class shows that runs of homozygosity reflect historical differences between the islands in their contributions from the source populations and postadmixture population dynamics. Finally, we find higher African ancestry on the X chromosome than on the autosomes, consistent with an excess of European males and African females contributing to the gene pool. Considering these genomic insights into population history in the context of Cabo Verde's historical record, we can identify how assumptions in genetic models impact inference of population history more broadly.
Collapse
Affiliation(s)
| | | | - Katherine Bobrek
- Department of Anthropology, Emory University, Atlanta, GA 30322, USA
| | - Hua Tang
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Isabel Inês Araújo
- Faculdade de Ciências e Tecnologia, Universidade de Cabo Verde (Uni-CV), Praia, Ilha de Santiago CP 379C, Cabo Verde
| | - Amy Goldberg
- Evolutionary Anthropology, Duke University, Durham, NC 27705, USA
| | - Sandra Beleza
- Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK
| |
Collapse
|
3
|
Çelik G, Tuncalı T. ROHMM-A flexible hidden Markov model framework to detect runs of homozygosity from genotyping data. Hum Mutat 2021; 43:158-168. [PMID: 34923717 DOI: 10.1002/humu.24316] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 11/29/2021] [Accepted: 12/15/2021] [Indexed: 11/05/2022]
Abstract
Runs of long homozygous (ROH) stretches are considered to be the result of consanguinity and usually contain recessive deleterious disease-causing mutations. Several algorithms have been developed to detect ROHs. Here, we developed a simple alternative strategy by examining X chromosome non-pseudoautosomal region to detect the ROHs from next-generation sequencing data utilizing the genotype probabilities and the hidden Markov model algorithm as a tool, namely ROHMM. It is implemented purely in java and contains both a command line and a graphical user interface. We tested ROHMM on simulated data as well as real population data from the 1000G Project and a clinical sample. Our results have shown that ROHMM can perform robustly producing highly accurate homozygosity estimations under all conditions thereby meeting and even exceeding the performance of its natural competitors.
Collapse
Affiliation(s)
- Gökalp Çelik
- Health Sciences Institute, Department of Medical Genetics, Ankara Yildirim Beyazit University, Ankara, Turkey
| | - Timur Tuncalı
- Department of Medical Genetics, Ankara University School of Medicine, Ankara, Turkey
| |
Collapse
|
4
|
Szpiech ZA, Mak ACY, White MJ, Hu D, Eng C, Burchard EG, Hernandez RD. Ancestry-Dependent Enrichment of Deleterious Homozygotes in Runs of Homozygosity. Am J Hum Genet 2019; 105:747-762. [PMID: 31543216 PMCID: PMC6817522 DOI: 10.1016/j.ajhg.2019.08.011] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 08/27/2019] [Indexed: 12/20/2022] Open
Abstract
Runs of homozygosity (ROH) are important genomic features that manifest when an individual inherits two haplotypes that are identical by descent. Their length distributions are informative about population history, and their genomic locations are useful for mapping recessive loci contributing to both Mendelian and complex disease risk. We have previously shown that ROH, and especially long ROH that are likely the result of recent parental relatedness, are enriched for homozygous deleterious coding variation in a worldwide sample of outbred individuals. However, the distribution of ROH in admixed populations and their relationship to deleterious homozygous genotypes is understudied. Here we analyze whole-genome sequencing data from 1,441 unrelated individuals from self-identified African American, Puerto Rican, and Mexican American populations. These populations are three-way admixed between European, African, and Native American ancestries and provide an opportunity to study the distribution of deleterious alleles partitioned by local ancestry and ROH. We re-capitulate previous findings that long ROH are enriched for deleterious variation genome-wide. We then partition by local ancestry and show that deleterious homozygotes arise at a higher rate when ROH overlap African ancestry segments than when they overlap European or Native American ancestry segments of the genome. These results suggest that, while ROH on any haplotype background are associated with an inflation of deleterious homozygous variation, African haplotype backgrounds may play a particularly important role in the genetic architecture of complex diseases for admixed individuals, highlighting the need for further study of these populations.
Collapse
Affiliation(s)
- Zachary A Szpiech
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA; Department of Biological Sciences, Auburn University, Auburn, AL 36842, USA.
| | - Angel C Y Mak
- Department of Medicine, University of California San Francisco, San Francisco, CA 94158, USA
| | - Marquitta J White
- Department of Medicine, University of California San Francisco, San Francisco, CA 94158, USA
| | - Donglei Hu
- Department of Medicine, University of California San Francisco, San Francisco, CA 94158, USA
| | - Celeste Eng
- Department of Medicine, University of California San Francisco, San Francisco, CA 94158, USA
| | - Esteban G Burchard
- Department of Medicine, University of California San Francisco, San Francisco, CA 94158, USA
| | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA; Department of Human Genetics, McGill University, Montreal, QC H3A 0G1, Canada; Genome Quebec Innovation Center, McGill University, Montreal, QC H3A 0G1, Canada.
| |
Collapse
|
5
|
Renaud G, Hanghøj K, Korneliussen TS, Willerslev E, Orlando L. Joint Estimates of Heterozygosity and Runs of Homozygosity for Modern and Ancient Samples. Genetics 2019; 212:587-614. [PMID: 31088861 PMCID: PMC6614887 DOI: 10.1534/genetics.119.302057] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/01/2019] [Indexed: 11/18/2022] Open
Abstract
Both the total amount and the distribution of heterozygous sites within individual genomes are informative about the genetic diversity of the population they belong to. Detecting true heterozygous sites in ancient genomes is complicated by the generally limited coverage achieved and the presence of post-mortem damage inflating sequencing errors. Additionally, large runs of homozygosity found in the genomes of particularly inbred individuals and of domestic animals can skew estimates of genome-wide heterozygosity rates. Current computational tools aimed at estimating runs of homozygosity and genome-wide heterozygosity levels are generally sensitive to such limitations. Here, we introduce ROHan, a probabilistic method which substantially improves the estimate of heterozygosity rates both genome-wide and for genomic local windows. It combines a local Bayesian model and a Hidden Markov Model at the genome-wide level and can work both on modern and ancient samples. We show that our algorithm outperforms currently available methods for predicting heterozygosity rates for ancient samples. Specifically, ROHan can delineate large runs of homozygosity (at megabase scales) and produce a reliable confidence interval for the genome-wide rate of heterozygosity outside of such regions from modern genomes with a depth of coverage as low as 5-6× and down to 7-8× for ancient samples showing moderate DNA damage. We apply ROHan to a series of modern and ancient genomes previously published and revise available estimates of heterozygosity for humans, chimpanzees and horses.
Collapse
Affiliation(s)
- Gabriel Renaud
- Lundbeck Foundation GeoGenetics Center, Globe Institute, University of Copenhagen, 1350K, Denmark
| | - Kristian Hanghøj
- Lundbeck Foundation GeoGenetics Center, Globe Institute, University of Copenhagen, 1350K, Denmark
- Laboratoire d'Anthropobiologie Moléculaire et d'Imagerie de Synthèse, CNRS UMR 5288, Université de Toulouse, Université Paul Sabatier, 31000, France
| | | | - Eske Willerslev
- Lundbeck Foundation GeoGenetics Center, Globe Institute, University of Copenhagen, 1350K, Denmark
- Department of Zoology, University of Cambridge, CB2 3EJ, UK
- The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- The Danish Institute for Advanced Study at The University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Ludovic Orlando
- Lundbeck Foundation GeoGenetics Center, Globe Institute, University of Copenhagen, 1350K, Denmark
- Laboratoire d'Anthropobiologie Moléculaire et d'Imagerie de Synthèse, CNRS UMR 5288, Université de Toulouse, Université Paul Sabatier, 31000, France
| |
Collapse
|
6
|
Detection and Classification of Hard and Soft Sweeps from Unphased Genotypes by Multilocus Genotype Identity. Genetics 2018; 210:1429-1452. [PMID: 30315068 DOI: 10.1534/genetics.118.301502] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 10/08/2018] [Indexed: 11/18/2022] Open
Abstract
Positive natural selection can lead to a decrease in genomic diversity at the selected site and at linked sites, producing a characteristic signature of elevated expected haplotype homozygosity. These selective sweeps can be hard or soft. In the case of a hard selective sweep, a single adaptive haplotype rises to high population frequency, whereas multiple adaptive haplotypes sweep through the population simultaneously in a soft sweep, producing distinct patterns of genetic variation in the vicinity of the selected site. Measures of expected haplotype homozygosity have previously been used to detect sweeps in multiple study systems. However, these methods are formulated for phased haplotype data, typically unavailable for nonmodel organisms, and some may have reduced power to detect soft sweeps due to their increased genetic diversity relative to hard sweeps. To address these limitations, we applied the H12 and H2/H1 statistics proposed in 2015 by Garud et al., which have power to detect both hard and soft sweeps, to unphased multilocus genotypes, denoting them as G12 and G2/G1. G12 (and the more direct expected homozygosity analog to H12, denoted G123) has comparable power to H12 for detecting both hard and soft sweeps. G2/G1 can be used to classify hard and soft sweeps analogously to H2/H1, conditional on a genomic region having high G12 or G123 values. The reason for this power is that, under random mating, the most frequent haplotypes will yield the most frequent multilocus genotypes. Simulations based on parameters compatible with our recent understanding of human demographic history suggest that expected homozygosity methods are best suited for detecting recent sweeps, and increase in power under recent population expansions. Finally, we find candidates for selective sweeps within the 1000 Genomes CEU, YRI, GIH, and CHB populations, which corroborate and complement existing studies.
Collapse
|
7
|
Relationship between Deleterious Variation, Genomic Autozygosity, and Disease Risk: Insights from The 1000 Genomes Project. Am J Hum Genet 2018; 102:658-675. [PMID: 29551419 DOI: 10.1016/j.ajhg.2018.02.013] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 02/19/2018] [Indexed: 12/11/2022] Open
Abstract
Genomic regions of autozygosity (ROAs) represent segments of individual genomes that are homozygous for haplotypes inherited identical-by-descent (IBD) from a common ancestor. ROAs are nonuniformly distributed across the genome, and increased ROA levels are a reported risk factor for numerous complex diseases. Previously, we hypothesized that long ROAs are enriched for deleterious homozygotes as a result of young haplotypes with recent deleterious mutations-relatively untouched by purifying selection-being paired IBD as a consequence of recent parental relatedness, a pattern supported by ROA and whole-exome sequence data on 27 individuals. Here, we significantly bolster support for our hypothesis and expand upon our original analyses using ROA and whole-genome sequence data on 2,436 individuals from The 1000 Genomes Project. Considering CADD deleteriousness scores, we reaffirm our previous observation that long ROAs are enriched for damaging homozygotes worldwide. We show that strongly damaging homozygotes experience greater enrichment than weaker damaging homozygotes, while overall enrichment varies appreciably among populations. Mendelian disease genes and those encoding FDA-approved drug targets have significantly increased rates of gain in damaging homozygotes with increasing ROA coverage relative to all other genes. In genes implicated in eight complex phenotypes for which ROA levels have been identified as a risk factor, rates of gain in damaging homozygotes vary across phenotypes and populations but frequently differ significantly from non-disease genes. These findings highlight the potential confounding effects of population background in the assessment of associations between ROA levels and complex disease risk, which might underlie reported inconsistencies in ROA-phenotype associations.
Collapse
|