1
|
Lucas-Sánchez M, Abdeli A, Bekada A, Calafell F, Benhassine T, Comas D. The Impact of Recent Demography on Functional Genetic Variation in North African Human Groups. Mol Biol Evol 2024; 41:msad283. [PMID: 38152862 PMCID: PMC10783648 DOI: 10.1093/molbev/msad283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 11/22/2023] [Accepted: 12/19/2023] [Indexed: 12/29/2023] Open
Abstract
The strategic location of North Africa has made the region the core of a wide range of human demographic events, including migrations, bottlenecks, and admixture processes. This has led to a complex and heterogeneous genetic and cultural landscape, which remains poorly studied compared to other world regions. Whole-exome sequencing is particularly relevant to determine the effects of these demographic events on current-day North Africans' genomes, since it allows to focus on those parts of the genome that are more likely to have direct biomedical consequences. Whole-exome sequencing can also be used to assess the effect of recent demography in functional genetic variation and the efficacy of natural selection, a long-lasting debate. In the present work, we use newly generated whole-exome sequencing and genome-wide array genotypes to investigate the effect of demography in functional variation in 7 North African populations, considering both cultural and demographic differences and with a special focus on Amazigh (plur. Imazighen) groups. We detect genetic differences among populations related to their degree of isolation and the presence of bottlenecks in their recent history. We find differences in the functional part of the genome that suggest a relaxation of purifying selection in the more isolated groups, allowing for an increase of putatively damaging variation. Our results also show a shift in mutational load coinciding with major demographic events in the region and reveal differences within and between cultural and geographic groups.
Collapse
Affiliation(s)
- Marcel Lucas-Sánchez
- Departament de Medicina i Ciències de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain
| | - Amine Abdeli
- Faculté des Sciences Biologiques, Laboratoire de Biologie Cellulaire et Moléculaire, Université des Sciences et de la Technologie Houari Boumediene, Alger, Algeria
| | - Asmahan Bekada
- Département de Biotechnologie, Faculté des Sciences de la Nature et de la Vie, Université Oran 1 (Ahmad Ben Bella), Oran, Algeria
| | - Francesc Calafell
- Departament de Medicina i Ciències de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain
| | - Traki Benhassine
- Faculté des Sciences Biologiques, Laboratoire de Biologie Cellulaire et Moléculaire, Université des Sciences et de la Technologie Houari Boumediene, Alger, Algeria
| | - David Comas
- Departament de Medicina i Ciències de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
2
|
Developing CIRdb as a catalog of natural genetic variation in the Canary Islanders. Sci Rep 2022; 12:16132. [PMID: 36168029 PMCID: PMC9514705 DOI: 10.1038/s41598-022-20442-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 09/13/2022] [Indexed: 11/29/2022] Open
Abstract
The current inhabitants of the Canary Islands have a unique genetic makeup in the European diversity landscape due to the existence of African footprints from recent admixture events, especially of North African components (> 20%). The underrepresentation of non-Europeans in genetic studies and the sizable North African ancestry, which is nearly absent from all existing catalogs of worldwide genetic diversity, justify the need to develop CIRdb, a population-specific reference catalog of natural genetic variation in the Canary Islanders. Based on array genotyping of the selected unrelated donors and comparisons against available datasets from European, sub-Saharan, and North African populations, we illustrate the intermediate genetic differentiation of Canary Islanders between Europeans and North Africans and the existence of within-population differences that are likely driven by genetic isolation. Here we describe the overall design and the methods that are being implemented to further develop CIRdb. This resource will help to strengthen the implementation of Precision Medicine in this population by contributing to increase the diversity in genetic studies. Among others, this will translate into improved ability to fine map disease genes and simplify the identification of causal variants and estimate the prevalence of unattended Mendelian diseases.
Collapse
|
3
|
Herzig AF, Ciullo M, Leutenegger AL, Perdry H. Moment estimators of relatedness from low-depth whole-genome sequencing data. BMC Bioinformatics 2022; 23:254. [PMID: 35751014 PMCID: PMC9233360 DOI: 10.1186/s12859-022-04795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 06/09/2022] [Indexed: 11/29/2022] Open
Abstract
Background Estimating relatedness is an important step for many genetic study designs. A variety of methods for estimating coefficients of pairwise relatedness from genotype data have been proposed. Both the kinship coefficient \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varphi$$\end{document}φ and the fraternity coefficient \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\psi$$\end{document}ψ for all pairs of individuals are of interest. However, when dealing with low-depth sequencing or imputation data, individual level genotypes cannot be confidently called. To ignore such uncertainty is known to result in biased estimates. Accordingly, methods have recently been developed to estimate kinship from uncertain genotypes. Results We present new method-of-moment estimators of both the coefficients \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varphi$$\end{document}φ and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\psi$$\end{document}ψ calculated directly from genotype likelihoods. We have simulated low-depth genetic data for a sample of individuals with extensive relatedness by using the complex pedigree of the known genetic isolates of Cilento in South Italy. Through this simulation, we explore the behaviour of our estimators, demonstrate their properties, and show advantages over alternative methods. A demonstration of our method is given for a sample of 150 French individuals with down-sampled sequencing data. Conclusions We find that our method can provide accurate relatedness estimates whilst holding advantages over existing methods in terms of robustness, independence from external software, and required computation time. The method presented in this paper is referred to as LowKi (Low-depth Kinship) and has been made available in an R package (https://github.com/genostats/LowKi). Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04795-8.
Collapse
Affiliation(s)
| | - M Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso - CNR, Naples, Italy.,IRCCS Neuromed, Pozzilli, Isernia, Italy
| | | | - A-L Leutenegger
- Inserm, Université Paris Cité, UMR 1141, NeuroDiderot, 75019, Paris, France
| | - H Perdry
- CESP Inserm U1018, Université Paris-Saclay, UVSQ, Villejuif, France
| |
Collapse
|
4
|
Maceda I, Lao O. Analysis of the Batch Effect Due to Sequencing Center in Population Statistics Quantifying Rare Events in the 1000 Genomes Project. Genes (Basel) 2021; 13:genes13010044. [PMID: 35052384 PMCID: PMC8775088 DOI: 10.3390/genes13010044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/19/2021] [Accepted: 12/21/2021] [Indexed: 12/01/2022] Open
Abstract
The 1000 Genomes Project (1000G) is one of the most popular whole genome sequencing datasets used in different genomics fields and has boosting our knowledge in medical and population genomics, among other fields. Recent studies have reported the presence of ghost mutation signals in the 1000G. Furthermore, studies have shown that these mutations can influence the outcomes of follow-up studies based on the genetic variation of 1000G, such as single nucleotide variants (SNV) imputation. While the overall effect of these ghost mutations can be considered negligible for common genetic variants in many populations, the potential bias remains unclear when studying low frequency genetic variants in the population. In this study, we analyze the effect of the sequencing center in predicted loss of function (LoF) alleles, the number of singletons, and the patterns of archaic introgression in the 1000G. Our results support previous studies showing that the sequencing center is associated with LoF and singletons independent of the population that is considered. Furthermore, we observed that patterns of archaic introgression were distorted for some populations depending on the sequencing center. When analyzing the frequency of SNPs showing extreme patterns of genotype differentiation among centers for CEU, YRI, CHB, and JPT, we observed that the magnitude of the sequencing batch effect was stronger at MAF < 0.2 and showed different profiles between CHB and the other populations. All these results suggest that data from 1000G must be interpreted with caution when considering statistics using variants at low frequency.
Collapse
Affiliation(s)
- Iago Maceda
- Population Genomics, CNAG-CRG, Centre for Genomic Regulation, 08028 Barcelona, Spain;
- Barcelona Institute of Science and Technology (BIST), 08036 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Oscar Lao
- Population Genomics, CNAG-CRG, Centre for Genomic Regulation, 08028 Barcelona, Spain;
- Barcelona Institute of Science and Technology (BIST), 08036 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
- Correspondence:
| |
Collapse
|
5
|
Whole-exome analysis in Tunisian Imazighen and Arabs shows the impact of demography in functional variation. Sci Rep 2021; 11:21125. [PMID: 34702931 PMCID: PMC8548440 DOI: 10.1038/s41598-021-00576-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/14/2021] [Indexed: 11/08/2022] Open
Abstract
Human populations are genetically affected by their demographic history, which shapes the distribution of their functional genomic variation. However, the genetic impact of recent demography is debated. This issue has been studied in different populations, but never in North Africans, despite their relevant cultural and demographic diversity. In this study we address the question by analyzing new whole-exome sequences from two culturally different Tunisian populations, an isolated Amazigh population and a close non-isolated Arab-speaking population, focusing on the distribution of functional variation. Both populations present clear differences in their variant frequency distribution, in general and for putatively damaging variation. This suggests a relevant effect in the Amazigh population of genetic isolation, drift, and inbreeding, pointing to relaxed purifying selection. We also discover the enrichment in Imazighen of variation associated to specific diseases or phenotypic traits, but the scarce genetic and biomedical data in the region limits further interpretation. Our results show the genomic impact of recent demography and reveal a clear genetic differentiation probably related to culture. These findings highlight the importance of considering cultural and demographic heterogeneity within North Africa when defining population groups, and the need for more data to improve knowledge on the region's health and disease landscape.
Collapse
|
6
|
Birolo G, Aneli S, Di Gaetano C, Cugliari G, Russo A, Allione A, Casalone E, Giorgio E, Paraboschi EM, Ardissino D, Duga S, Asselta R, Matullo G. Functional and clinical implications of genetic structure in 1686 Italian exomes. Hum Mutat 2021; 42:272-289. [PMID: 33326653 DOI: 10.1002/humu.24156] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 11/13/2020] [Accepted: 12/11/2020] [Indexed: 12/12/2022]
Abstract
To reconstruct the phenotypical and clinical implications of the Italian genetic structure, we thoroughly analyzed a whole-exome sequencing data set comprised of 1686 healthy Italian individuals. We found six previously unreported variants with remarkable frequency differences between Northern and Southern Italy in the HERC2, OR52R1, ADH1B, and THBS4 genes. We reported 36 clinically relevant variants (submitted as pathogenic, risk factors, or drug response in ClinVar) with significant frequency differences between Italy and Europe. We then explored putatively pathogenic variants in the Italian exome. On average, our Italian individuals carried 16.6 protein-truncating variants (PTVs), with 2.5% of the population having a PTV in one of the 59 American College of Medical Genetics (ACMG) actionable genes. Lastly, we looked for PTVs that are likely to cause Mendelian diseases. We found four heterozygous PTVs in haploinsufficient genes (KAT6A, PTCH1, and STXBP1) and three homozygous PTVs in genes causing recessive diseases (DPYD, FLG, and PYGM). Comparing frequencies from our data set to other public databases, like gnomAD, we showed the importance of population-specific databases for a more accurate assessment of variant pathogenicity. For this reason, we made aggregated frequencies from our data set publicly available as a tool for both clinicians and researchers (http://nigdb.cineca.it; NIG-ExIT).
Collapse
Affiliation(s)
- Giovanni Birolo
- Department of Medical Sciences, University of Turin, Turin, Italy
| | - Serena Aneli
- Department of Medical Sciences, University of Turin, Turin, Italy
| | | | | | - Alessia Russo
- Department of Medical Sciences, University of Turin, Turin, Italy
| | | | | | - Elisa Giorgio
- Department of Medical Sciences, University of Turin, Turin, Italy
| | - Elvezia M Paraboschi
- Department of Biomedical Sciences, Humanitas University, Rozzano, Milan, Italy.,Humanitas Clinical and Research Center-IRCCS, Rozzano, Milan, Italy
| | - Diego Ardissino
- Division of Cardiology, Azienda Ospedaliero-Universitaria di Parma, Parma, Italy
| | - Stefano Duga
- Department of Biomedical Sciences, Humanitas University, Rozzano, Milan, Italy.,Humanitas Clinical and Research Center-IRCCS, Rozzano, Milan, Italy
| | - Rosanna Asselta
- Department of Biomedical Sciences, Humanitas University, Rozzano, Milan, Italy.,Humanitas Clinical and Research Center-IRCCS, Rozzano, Milan, Italy
| | - Giuseppe Matullo
- Department of Medical Sciences, University of Turin, Turin, Italy
| |
Collapse
|
7
|
Staněk D, Sedláčková L, Seeman P, Šafka Brožková D, Laššuthová P. Whole-Exome Sequencing in Czech Patients with Neurogenetic Diseases. Genet Test Mol Biomarkers 2020; 24:264-273. [DOI: 10.1089/gtmb.2019.0232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- David Staněk
- DNA Laboratory, Department of Paediatric Neurology, Charles University and University Hospital Motol, Prague, Czech Republic
| | - Lucie Sedláčková
- DNA Laboratory, Department of Paediatric Neurology, Charles University and University Hospital Motol, Prague, Czech Republic
| | - Pavel Seeman
- DNA Laboratory, Department of Paediatric Neurology, Charles University and University Hospital Motol, Prague, Czech Republic
| | - Dana Šafka Brožková
- DNA Laboratory, Department of Paediatric Neurology, Charles University and University Hospital Motol, Prague, Czech Republic
| | - Petra Laššuthová
- DNA Laboratory, Department of Paediatric Neurology, Charles University and University Hospital Motol, Prague, Czech Republic
| |
Collapse
|