1
|
Huang Y, Carmi S, Ringbauer H. Estimating effective population size trajectories from time-series identity-by-descent segments. Genetics 2025:iyae212. [PMID: 39854269 DOI: 10.1093/genetics/iyae212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Accepted: 12/12/2024] [Indexed: 01/26/2025] Open
Abstract
Long, identical haplotypes shared between pairs of individuals, known as identity-by-descent (IBD) segments, result from recently shared co-ancestry. Various methods have been developed to utilize IBD sharing for demographic inference in contemporary DNA data. Recent methodological advances have extended the screening for IBD segments to ancient DNA (aDNA) data, making demographic inference based on IBD also possible for aDNA. However, aDNA data typically have varying sampling times, but most demographic inference methods for modern data assume that sampling is contemporaneous. Here, we present Ttne (Time-Transect Ne), which models time-transect sampling to infer recent effective population size trajectories. Using simulations, we show that utilizing IBD sharing in time series increased resolution to infer recent fluctuations in effective population sizes compared with methods that only use contemporaneous samples. To account for IBD detection errors common in empirical analyses, we implemented an approach to estimate and model IBD detection errors. Finally, we applied Ttne to two aDNA time transects: individuals associated with the Copper Age Corded Ware Culture and Medieval England. In both cases, we found evidence of a growing population, a signal consistent with archaeological records.
Collapse
Affiliation(s)
- Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04317, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig 04109, Germany
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04317, Germany
| |
Collapse
|
2
|
Stoneman HR, Price AM, Trout NS, Lamont R, Tifour S, Pozdeyev N, Crooks K, Lin M, Rafaels N, Gignoux CR, Marker KM, Hendricks AE. Characterizing substructure via mixture modeling in large-scale genetic summary statistics. Am J Hum Genet 2025:S0002-9297(24)00449-X. [PMID: 39824191 DOI: 10.1016/j.ajhg.2024.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 12/09/2024] [Accepted: 12/09/2024] [Indexed: 01/20/2025] Open
Abstract
Genetic summary data are broadly accessible and highly useful, including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into summary data, such as allele frequencies, masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted-for substructure limits summary data usability, especially for understudied or admixed populations. There is a need for methods to enable the harmonization of summary data where the underlying substructure is matched between datasets. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to enable the harmonization of genetic summary data by estimating and adjusting for substructure. In extensive simulations and application to public data, we show that Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and scans for potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse, publicly available summary data, resulting in improved and more equitable research.
Collapse
Affiliation(s)
- Hayley R Stoneman
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Adelle M Price
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikole Scribner Trout
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Riley Lamont
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Souha Tifour
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikita Pozdeyev
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristy Crooks
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Pathology, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Meng Lin
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Nicholas Rafaels
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christopher R Gignoux
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Katie M Marker
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Audrey E Hendricks
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
| |
Collapse
|
3
|
Temple SD, Browning SR, Thompson EA. Fast simulation of identity-by-descent segments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.12.13.628449. [PMID: 39829821 PMCID: PMC11741331 DOI: 10.1101/2024.12.13.628449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
The worst-case runtime complexity to simulate haplotype segments identical by descent (IBD) is quadratic in sample size. We propose two main techniques to reduce the compute time, both of which are motivated by coalescent and recombination processes. We provide mathematical results that explain why our algorithm should outperform a naive implementation with high probability. In our experiments, we observe average compute times to simulate detectable IBD segments around a locus that scale approximately linearly in sample size and take a couple of seconds for sample sizes that are less than ten thousand diploid individuals. In contrast, we find that existing methods to simulate IBD segments take minutes to hours for sample sizes exceeding a few thousand diploid individuals. When using IBD segments to study recent positive selection around a locus, our efficient simulation algorithm makes feasible statistical inferences, e.g., parametric bootstrapping in analyses of large biobanks, that would be otherwise intractable.
Collapse
Affiliation(s)
- Seth D. Temple
- Department of Statistics, University of Washington, Seattle, WA, USA
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
- Michigan Institute of Data Science, University of Michigan, Ann Arbor, MI, USA
| | | | | |
Collapse
|
4
|
Palma-Martínez MJ, Posadas-García YS, López-Ángeles BE, Quiroz-López C, Lewis ACF, Bird KA, Lasisi T, Zaidi AA, Sohail M. The multi-scale complexity of human genetic variation beyond continental groups. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.11.627824. [PMID: 39763978 PMCID: PMC11702577 DOI: 10.1101/2024.12.11.627824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/14/2025]
Abstract
Traditional clustering and visualization approaches in human genetics often operate under frameworks that assume inherent, discrete groupings1,2. These methods can inadvertently simplify multifaceted relationships, functioning to entrench the idea of typological groups3. We introduce a network-based pipeline and visualization tool grounded in relational thinking4, which constructs networks from a variety of genetic similarity metrics. We identify communities at multiple resolutions, departing from typological models of analysis and interpretation that categorize individuals into a (predefined) number of sets. We applied our pipeline to a dataset merged from the 1000 Genomes and Human Genome Diversity Project5, revealing the limitations of traditional groupings and capturing the complexities introduced by demographic events and evolutionary processes. This method embraces the context-specificity of genetic similarities that are salient depending on the question, markers of interest, and study individuals. Different numbers of communities are revealed depending on the resolution chosen and metric used, underscoring a fluid spectrum of genetic relationships and challenging the notion of universal categorization. We provide a web application (https://sohail-lab.shinyapps.io/GG-NC/) for interactive visualization and engagement with these intricate genetic landscapes.
Collapse
Affiliation(s)
| | | | | | | | - Anna C F Lewis
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Kevin A Bird
- Department of Plant Sciences, University of California, Davis, Davis, CA 95616, USA
| | - Tina Lasisi
- Department of Anthropology
- Department of Ecology & Evolutionary Biology, University of Michigan, Ann Arbor, MI, United States
| | - Arslan A Zaidi
- Genetics, Cell, and Developmental Biology Department, University of Minnesota, Minneapolis, Minnesota, USA
- Institute of Health Informatics, University of Minnesota
| | | |
Collapse
|
5
|
Fine AG, Steinrücken M. A novel expectation-maximization approach to infer general diploid selection from time-series genetic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.10.593575. [PMID: 38798346 PMCID: PMC11118272 DOI: 10.1101/2024.05.10.593575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Detecting and quantifying the strength of selection is a main objective in population genetics. Since selection acts over multiple generations, many approaches have been developed to detect and quantify selection using genetic data sampled at multiple points in time. Such time series genetic data is commonly analyzed using Hidden Markov Models, but in most cases, under the assumption of additive selection. However, many examples of genetic variation exhibiting non-additive mechanisms exist, making it critical to develop methods that can characterize selection in more general scenarios. Thus, we extend a previously introduced expectation-maximization algorithm for the inference of additive selection coefficients to the case of general diploid selection, in which the heterozygote and homozygote fitness are parameterized independently. We furthermore introduce a framework to identify bespoke modes of diploid selection from given data, as well as a procedure for aggregating data across linked loci to increase power and robustness. Using extensive simulation studies, we find that our method accurately and efficiently estimates selection coefficients for different modes of diploid selection across a wide range of scenarios; however, power to classify the mode of selection is low unless selection is very strong. We apply our method to ancient DNA samples from Great Britain in the last 4,450 years, and detect evidence for selection in six genomic regions, including the well-characterized LCT locus. Our work is the first genome-wide scan characterizing signals of general diploid selection.
Collapse
Affiliation(s)
- Adam G Fine
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA
- Graduate Program in Biophysical Sciences, University of Chicago, Chicago, Illinois, USA
| | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
6
|
Temple SD, Waples RK, Browning SR. Modeling recent positive selection using identity-by-descent segments. Am J Hum Genet 2024; 111:2510-2529. [PMID: 39362217 PMCID: PMC11568764 DOI: 10.1016/j.ajhg.2024.08.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 08/29/2024] [Accepted: 08/30/2024] [Indexed: 10/05/2024] Open
Abstract
Recent positive selection can result in an excess of long identity-by-descent (IBD) haplotype segments overlapping a locus. The statistical methods that we propose here address three major objectives in studying selective sweeps: scanning for regions of interest, identifying possible sweeping alleles, and estimating a selection coefficient s. First, we implement a selection scan to locate regions with excess IBD rates. Second, we estimate the allele frequency and location of an unknown sweeping allele by aggregating over variants that are more abundant in an inferred outgroup with excess IBD rate versus the rest of the sample. Third, we propose an estimator for the selection coefficient and quantify uncertainty using the parametric bootstrap. Comparing against state-of-the-art methods in extensive simulations, we show that our methods are more precise at estimating s when s≥0.015. We also show that our 95% confidence intervals contain s in nearly 95% of our simulations. We apply these methods to study positive selection in European ancestry samples from the Trans-Omics for Precision Medicine project. We analyze eight loci where IBD rates are more than four standard deviations above the genome-wide median, including LCT where the maximum IBD rate is 35 standard deviations above the genome-wide median. Overall, we present robust and accurate approaches to study recent adaptive evolution without knowing the identity of the causal allele or using time series data.
Collapse
Affiliation(s)
- Seth D Temple
- Department of Statistics, University of Washington, Seattle, WA, USA.
| | - Ryan K Waples
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
7
|
Hong MM, Froelicher D, Magner R, Popic V, Berger B, Cho H. Secure discovery of genetic relatives across large-scale and distributed genomic data sets. Genome Res 2024; 34:1312-1323. [PMID: 39111815 PMCID: PMC11529841 DOI: 10.1101/gr.279057.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/31/2024] [Indexed: 10/02/2024]
Abstract
Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging owing to the burden of estimating kinship between all the pairs of individuals across data sets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing (LSH) approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and All of Us data sets. On a data set of 200,000 individuals split between two parties, SF-Relate detects 97% of third-degree or closer relatives within 15 h of runtime. Our work enables secure identification of relatives across large-scale genomic data sets.
Collapse
Affiliation(s)
- Matthew M Hong
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - David Froelicher
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
| | - Ricky Magner
- Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
| | - Victoria Popic
- Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA;
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA;
- Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Hyunghoon Cho
- Department of Biomedical Informatics and Data Science, Yale University, New Haven, Connecticut 06510, USA
| |
Collapse
|
8
|
McCann RS, Courneya JP, Donnelly MJ, Laufer MK, Mzilahowa T, Stewart K, Agossa F, Tezzo FW, Miles A, Takala-Harrison S, O’Connor TD. Variation in spatial population structure in the Anopheles gambiae species complex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.26.595955. [PMID: 38853983 PMCID: PMC11160690 DOI: 10.1101/2024.05.26.595955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Anopheles gambiae, Anopheles coluzzii, and Anopheles arabiensis are three of the most widespread vectors of malaria parasites, with geographical ranges stretching across wide swaths of Africa. Understanding the population structure of these closely related species, including the extent to which populations are connected by gene flow, is essential for understanding how vector control implemented in one location might indirectly affect vector populations in other locations. Here, we assessed the population structure of each species based on a combined data set of publicly available and newly processed whole-genome sequences. The data set included single nucleotide polymorphisms from whole genomes of 2,410 individual mosquitoes sampled from 128 locations across 19 African countries. We found that A. gambiae sampled from several countries in West and Central Africa showed low genetic differentiation from each other according to principal components analysis (PCA) and ADMIXTURE modeling. Using Estimated Effective Migration Surfaces (EEMS), we showed that this low genetic differentiation indicates high effective migration rates for A. gambiae across this region. Outside of this region, we found eight groups of sampling locations from Central, East, and Southern Africa for which A. gambiae showed higher genetic differentiation, and lower effective migration rates, between each other and the West/Central Africa group. These results indicate that the barriers to and corridors for migration between populations of A. gambiae differ across the geographical range of this malaria vector species. Using the same methods, we found higher genetic differentiation and lower migration rates between populations of A. coluzzii in West and Central Africa than for A. gambiae in the same region. In contrast, we found lower genetic differentiation and higher migration rates between populations of A. arabiensis in Tanzania, compared to A. gambiae in the same region. These differences between A. gambiae, A. coluzzii, and A. arabiensis indicate that migration barriers and corridors may vary, even between very closely related species. Overall, our results demonstrate that migration rates vary both within and between species of Anopheles mosquitoes, presumably based on species-specific responses to the ecological or environmental conditions that may impede or facilitate migration, and the geographical patterns of these conditions across the landscape. Together with previous findings, this study provides robust evidence that migration rates between populations of malaria vectors depend on the ecological context, which should be considered when planning surveillance of vector populations, monitoring for insecticide resistance, and evaluating interventions.
Collapse
Affiliation(s)
- Robert S. McCann
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, USA
| | - Jean-Paul Courneya
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, USA
| | - Martin J. Donnelly
- Deptarment of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, UK
| | - Miriam K. Laufer
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, USA
| | - Themba Mzilahowa
- Malaria Alert Centre, Kamuzu University of Health Sciences, Blantyre, Malawi
| | - Kathleen Stewart
- Center for Geospatial Information Science, Department of Geographical Sciences, University of Maryland, College Park, USA
| | - Fiacre Agossa
- Unit of Entomology, Department of Parasitology, Institut National de Recherche Biomédicale (INRB/Kinshasa), Kinshasa, Democratic Republic of the Congo
- U.S. President’s Malaria Initiative (PMI) Evolve Project, Abt Associates, Rockville, USA
- Department of Environmental Health, School of Public Health, Faculty of Medicine, University of Kinshasa, Kinshasa, Democratic Republic of Congo
| | - Francis Wat’senga Tezzo
- Unit of Entomology, Department of Parasitology, Institut National de Recherche Biomédicale (INRB/Kinshasa), Kinshasa, Democratic Republic of the Congo
| | - Alistair Miles
- Genomic Surveillance Unit, Wellcome Sanger Institute, Cambridge, UK
| | - Shannon Takala-Harrison
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, USA
| | - Timothy D. O’Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, USA
- Program for Health Equity and Population Health, University of Maryland School of Medicine, Baltimore, USA
| | | |
Collapse
|
9
|
Lancaster MC, Chen HH, Shoemaker MB, Fleming MR, Strickland TL, Baker JT, Evans GF, Polikowsky HG, Samuels DC, Huff CD, Roden DM, Below JE. Detection of distant relatedness in biobanks to identify undiagnosed cases of Mendelian disease as applied to Long QT syndrome. Nat Commun 2024; 15:7507. [PMID: 39209900 PMCID: PMC11362435 DOI: 10.1038/s41467-024-51977-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open
Abstract
Rare genetic diseases are typically studied in referral populations, resulting in underdiagnosis and biased assessment of penetrance and phenotype. To address this, we develop a generalizable method of genotype inference based on distant relatedness and deploy this to identify undiagnosed Type 5 Long QT Syndrome (LQT5) rare variant carriers in a non-referral population. We identify 9 LQT5 families referred to a single specialty clinic, each carrying p.Asp76Asn, the most common LQT5 variant. We uncover recent common ancestry and a single shared haplotype among probands. Application to a non-referral population of 69,819 BioVU biobank subjects identifies 22 additional subjects sharing this haplotype, which we confirm to carry p.Asp76Asn. Referral and non-referral carriers have prolonged QT interval corrected for heart rate (QTc) compared to controls, and, among carriers, the QTc polygenic score is independently associated with QTc prolongation. Thus, our innovative analysis of shared chromosomal segments identifies undiagnosed cases of genetic disease and refines the understanding of LQT5 penetrance and phenotype.
Collapse
Affiliation(s)
- Megan C Lancaster
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Hung-Hsin Chen
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 11524, Taiwan
| | - M Benjamin Shoemaker
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Matthew R Fleming
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Teresa L Strickland
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - James T Baker
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Grahame F Evans
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Hannah G Polikowsky
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37232, USA
| | - Chad D Huff
- Division of Cancer Prevention and Population Sciences, Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Dan M Roden
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jennifer E Below
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| |
Collapse
|
10
|
Fan WTL, Wakeley J. Latent mutations in the ancestries of alleles under selection. Theor Popul Biol 2024; 158:1-20. [PMID: 38697365 DOI: 10.1016/j.tpb.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 04/23/2024] [Accepted: 04/29/2024] [Indexed: 05/05/2024]
Abstract
We consider a single genetic locus with two alleles A1 and A2 in a large haploid population. The locus is subject to selection and two-way, or recurrent, mutation. Assuming the allele frequencies follow a Wright-Fisher diffusion and have reached stationarity, we describe the asymptotic behaviors of the conditional gene genealogy and the latent mutations of a sample with known allele counts, when the count n1 of allele A1 is fixed, and when either or both the sample size n and the selection strength |α| tend to infinity. Our study extends previous work under neutrality to the case of non-neutral rare alleles, asserting that when selection is not too strong relative to the sample size, even if it is strongly positive or strongly negative in the usual sense (α→-∞ or α→+∞), the number of latent mutations of the n1 copies of allele A1 follows the same distribution as the number of alleles in the Ewens sampling formula. On the other hand, very strong positive selection relative to the sample size leads to neutral gene genealogies with a single ancient latent mutation. We also demonstrate robustness of our asymptotic results against changing population sizes, when one of |α| or n is large.
Collapse
Affiliation(s)
- Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, 831 East 3rd St, Bloomington, 47405, IN, USA; Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Ave, Cambridge, 02138, MA, USA.
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Ave, Cambridge, 02138, MA, USA.
| |
Collapse
|
11
|
Mahmoudiandehkordi S, Maadooliat M, Schrodi SJ. gwid: an R package and Shiny application for Genome-Wide analysis of IBD data. BIOINFORMATICS ADVANCES 2024; 4:vbae115. [PMID: 39246385 PMCID: PMC11379470 DOI: 10.1093/bioadv/vbae115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/13/2024] [Accepted: 07/29/2024] [Indexed: 09/10/2024]
Abstract
Summary Genome-wide identity by descent (gwid) is an R package developed for the analysis of identity-by-descent (IBD) data pertaining to dichotomous traits. This package offers a set of tools to assess differential IBD levels for the two states of a binary trait, yielding informative and meaningful results. Furthermore, it provides convenient functions to visualize the outcomes of these analyses, enhancing the interpretability and accessibility of the results. To assess the performance of the package, we conducted an evaluation using real genotype data derived from the SNPs to investigate rheumatoid arthritis susceptibility from the Marshfield Clinic Personalized Medicine Research Project. Availability and implementation gwid is available as an open-source R package. Release versions can be accessed on CRAN (https://cran.r-project.org/package=gwid) for all major operating systems. The development version is maintained on GitHub (https://github.com/soroushmdg/gwid) and full documentation with examples and workflow templates is provided via the package website (http://tinyurl.com/gwid-tutorial). An interactive R Shiny dashboard is also developed (https://tinyurl.com/gwid-shiny).
Collapse
Affiliation(s)
- Soroush Mahmoudiandehkordi
- Department of Mathematical and Statistical Sciences, Marquette University, Milwaukee, WI 53233, United States
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - Mehdi Maadooliat
- Department of Mathematical and Statistical Sciences, Marquette University, Milwaukee, WI 53233, United States
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - Steven J Schrodi
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI 53706, United States
- Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI 53706, United States
| |
Collapse
|
12
|
Chotai M, Wei X, Messer PW. Signatures of selective sweeps in continuous-space populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.26.605365. [PMID: 39091822 PMCID: PMC11291165 DOI: 10.1101/2024.07.26.605365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Selective sweeps describe the process by which an adaptive mutation arises and rapidly fixes in the population, thereby removing genetic variation in its genomic vicinity. The expected signatures of selective sweeps are relatively well understood in panmictic population models, yet natural populations often extend across larger geographic ranges where individuals are more likely to mate with those born nearby. To investigate how such spatial population structure can affect sweep dynamics and signatures, we simulated selective sweeps in populations inhabiting a two-dimensional continuous landscape. The maximum dispersal distance of offspring from their parents can be varied in our simulations from an essentially panmictic population to scenarios with increasingly limited dispersal. We find that in low-dispersal populations, adaptive mutations spread more slowly than in panmictic ones, while recombination becomes less effective at breaking up genetic linkage around the sweep locus. Together, these factors result in a trough of reduced genetic diversity around the sweep locus that looks very similar across dispersal rates. We also find that the site frequency spectrum around hard sweeps in low-dispersal populations becomes enriched for intermediate-frequency variants, making these sweeps appear softer than they are. Furthermore, haplotype heterozygosity at the sweep locus tends to be elevated in low-dispersal scenarios as compared to panmixia, contrary to what we observe in neutral scenarios without sweeps. The haplotype patterns generated by these hard sweeps in low-dispersal populations can resemble soft sweeps from standing genetic variation that arose from substantially older alleles. Our results highlight the need for better accounting for spatial population structure when making inferences about selective sweeps.
Collapse
Affiliation(s)
- Meera Chotai
- Department of Computational Biology, Cornell University
| | - Xinzhu Wei
- Department of Computational Biology, Cornell University
| | | |
Collapse
|
13
|
Biddanda A, Bandyopadhyay E, de la Fuente Castro C, Witonsky D, Urban Aragon JA, Pasupuleti N, Moots HM, Fonseca R, Freilich S, Stanisavic J, Willis T, Menon A, Mustak MS, Kodira CD, Naren AP, Sikdar M, Rai N, Raghavan M. Distinct positions of genetic and oral histories: Perspectives from India. HGG ADVANCES 2024; 5:100305. [PMID: 38720459 PMCID: PMC11153255 DOI: 10.1016/j.xhgg.2024.100305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/04/2024] [Accepted: 05/04/2024] [Indexed: 05/16/2024] Open
Abstract
Over the past decade, genomic data have contributed to several insights on global human population histories. These studies have been met both with interest and critically, particularly by populations with oral histories that are records of their past and often reference their origins. While several studies have reported concordance between oral and genetic histories, there is potential for tension that may stem from genetic histories being prioritized or used to confirm community-based knowledge and ethnography, especially if they differ. To investigate the interplay between oral and genetic histories, we focused on the southwestern region of India and analyzed whole-genome sequence data from 156 individuals identifying as Bunt, Kodava, Nair, and Kapla. We supplemented limited anthropological records on these populations with oral history accounts from community members and historical literature, focusing on references to non-local origins such as the ancient Scythians in the case of Bunt, Kodava, and Nair, members of Alexander the Great's army for the Kodava, and an African-related source for Kapla. We found these populations to be genetically most similar to other Indian populations, with the Kapla more similar to South Indian tribal populations that maximize a genetic ancestry related to Ancient Ancestral South Indians. We did not find evidence of additional genetic sources in the study populations than those known to have contributed to many other present-day South Asian populations. Our results demonstrate that oral and genetic histories may not always provide consistent accounts of population origins and motivate further community-engaged, multi-disciplinary investigations of non-local origin stories in these communities.
Collapse
Affiliation(s)
- Arjun Biddanda
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Esha Bandyopadhyay
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Constanza de la Fuente Castro
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA; Programa de Genética Humana, Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - David Witonsky
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | | | - Nagarjuna Pasupuleti
- Department of Applied Zoology, Mangalore University, Mangalagangothri, Karnataka 574199, India
| | - Hannah M Moots
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA; Institute for the Study of Ancient Cultures Museum, University of Chicago, Chicago, IL, USA
| | - Renée Fonseca
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Suzanne Freilich
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA; Department of Evolutionary Anthropology, University of Vienna, Vienna 1090, Austria
| | - Jovan Stanisavic
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Tabitha Willis
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Anoushka Menon
- Department of Archaeology, University of Cambridge, Cambridge CB2 3DZ, UK
| | - Mohammed S Mustak
- Department of Applied Zoology, Mangalore University, Mangalagangothri, Karnataka 574199, India
| | | | - Anjaparavanda P Naren
- Division of Pulmonary Medicine, Cystic Fibrosis Research Center, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Mithun Sikdar
- Anthropological Survey of India, Mysore, Karnataka 570026, India
| | - Niraj Rai
- Birbal Sahni Institute of Palaeosciences, Uttar Pradesh, Lucknow, Uttar Pradesh 226007, India.
| | - Maanasa Raghavan
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
14
|
Guo B, Takala-Harrison S, O’Connor TD. Benchmarking and Optimization of Methods for the Detection of Identity-By-Descent in High-Recombining Plasmodium falciparum Genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.04.592538. [PMID: 38746392 PMCID: PMC11092787 DOI: 10.1101/2024.05.04.592538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Genomic surveillance is crucial for identifying at-risk populations for targeted malaria control and elimination. Identity-by-descent (IBD) is increasingly being used in Plasmodium population genomics to estimate genetic relatedness, effective population size (N e ), population structure, and signals of positive selection. Despite its potential, a thorough evaluation of IBD segment detection tools for species with high recombination rates, such as P. falciparum, remains absent. Here, we perform comprehensive benchmarking of IBD callers - probabilistic (hmmIBD, isoRelate), identity-by-state-based (hap-IBD, phased IBD) and others (Refined IBD) - using population genetic simulations tailored for high recombination, and IBD quality metrics at both the IBD segment level and the IBD-based downstream inference level. Our results demonstrate that low marker density per genetic unit, related to high recombination relative to mutation, significantly compromises the accuracy of detected IBD segments. In genomes with high recombination rates resembling P. falciparum, most IBD callers exhibit high false negative rates for shorter IBD segments, which can be partially mitigated through optimization of IBD caller parameters, especially those related to marker density. Notably, IBD detected with optimized parameters allows for more accurate capture of selection signals and population structure; IBD-based N e inference is very sensitive to IBD detection errors, with IBD called from hmmIBD uniquely providing less biased estimates of N e in this context. Validation with empirical data from the MalariaGEN Pf 7 database, representing different transmission settings, corroborates these findings. We conclude that context-specific evaluation and parameter optimization are essential for accurate IBD detection in high-recombining species and recommend hmmIBD for quality-sensitive analysis, such as estimation of N e in these species. Our optimization and high-level benchmarking methods not only improve IBD segment detection in high-recombining genomes but also enhance overall genomic analysis, paving the way for more accurate genomic surveillance and targeted intervention strategies for malaria.
Collapse
Affiliation(s)
- Bing Guo
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD USA
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Shannon Takala-Harrison
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD USA
| | - Timothy D. O’Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| |
Collapse
|
15
|
Santos R, Moreno-Torres V, Pintos I, Corral O, de Mendoza C, Soriano V, Corpas M. Low-coverage whole genome sequencing for a highly selective cohort of severe COVID-19 patients. GIGABYTE 2024; 2024:gigabyte127. [PMID: 38948510 PMCID: PMC11211761 DOI: 10.46471/gigabyte.127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 06/04/2024] [Indexed: 07/02/2024] Open
Abstract
Despite the advances in genetic marker identification associated with severe COVID-19, the full genetic characterisation of the disease remains elusive. This study explores imputation in low-coverage whole genome sequencing for a severe COVID-19 patient cohort. We generated a dataset of 79 imputed variant call format files using the GLIMPSE1 tool, each containing an average of 9.5 million single nucleotide variants. Validation revealed a high imputation accuracy (squared Pearson correlation ≍0.97) across sequencing platforms, showcasing GLIMPSE1's ability to confidently impute variants with minor allele frequencies as low as 2% in individuals with Spanish ancestry. We carried out a comprehensive analysis of the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms developed to characterise severe COVID-19 symptoms. The methods and findings presented here can be leveraged for future genomic projects to gain vital insights into health challenges like COVID-19.
Collapse
Affiliation(s)
- Renato Santos
- National Heart & Lung Institute, Imperial College London, London, UK
| | - Víctor Moreno-Torres
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Madrid, Spain
| | - Ilduara Pintos
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Madrid, Spain
| | - Octavio Corral
- Health Sciences School & Medical Centre, Universidad Internacional La Rioja (UNIR), Madrid, Spain
| | - Carmen de Mendoza
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Madrid, Spain
| | - Vicente Soriano
- Health Sciences School & Medical Centre, Universidad Internacional La Rioja (UNIR), Madrid, Spain
| | - Manuel Corpas
- School of Life Sciences, University of Westminster, London, UK
| |
Collapse
|
16
|
Zhang W, Yuan K, Wen R, Li H, Ni X. Reconstruct recent multi-population migration history by using identical-by-descent sharing. J Genet Genomics 2024; 51:642-651. [PMID: 38423503 DOI: 10.1016/j.jgg.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 02/19/2024] [Accepted: 02/20/2024] [Indexed: 03/02/2024]
Abstract
Identical-by-descent (IBD) is a fundamental genomic characteristic in population genetics and has been widely used for population history reconstruction. However, limited by the nature of IBD, which could only capture the relationship between two individuals/haplotypes, existing IBD-based history inference is constrained to two populations. In this study, we propose a framework by leveraging IBD sharing in multi-population and develop a method, MatrixIBD, to reconstruct recent multi-population migration history. Specifically, we employ the structured coalescent theory to precisely model the genealogical process and then estimate the IBD sharing across multiple populations. Within our model, we establish a theoretical connection between migration history and IBD sharing. Our method is rigorously evaluated through simulations, revealing its remarkable accuracy and robustness. Furthermore, we apply MatrixIBD to Central and South Asia in the Human Genome Diversity Project and successfully reconstruct the recent migration history of three closely related populations in South Asia. By taking into account the IBD sharing across multiple populations simultaneously, MatrixIBD enables us to attain clearer and more comprehensive insights into the history of regions characterized by complex migration dynamics, providing a holistic perspective on intricate patterns embedded within the recent population migration history.
Collapse
Affiliation(s)
- Wenxiao Zhang
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China
| | - Kai Yuan
- The Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ru Wen
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China
| | - Haifang Li
- Baidu Incorporated, Beijing 100085, China
| | - Xumin Ni
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China.
| |
Collapse
|
17
|
Stoneman HR, Price A, Trout NS, Lamont R, Tifour S, Pozdeyev N, Crooks K, Lin M, Rafaels N, Gignoux CR, Marker KM, Hendricks AE. Characterizing substructure via mixture modeling in large-scale genetic summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577805. [PMID: 38766180 PMCID: PMC11100604 DOI: 10.1101/2024.01.29.577805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Genetic summary data are broadly accessible and highly useful including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into groups masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted substructure limits summary data usability, especially for understudied or admixed populations. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to estimate and adjust for substructure in genetic summary data. In extensive simulations and application to public data, Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and identifies potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse publicly available summary data resulting in improved and more equitable research.
Collapse
Affiliation(s)
- Hayley R Stoneman
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Adelle Price
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikole Scribner Trout
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Riley Lamont
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Souha Tifour
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikita Pozdeyev
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristy Crooks
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Department of Pathology, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Meng Lin
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Nicholas Rafaels
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christopher R Gignoux
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Katie M Marker
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Audrey E Hendricks
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| |
Collapse
|
18
|
Guo B, Borda V, Laboulaye R, Spring MD, Wojnarski M, Vesely BA, Silva JC, Waters NC, O'Connor TD, Takala-Harrison S. Strong positive selection biases identity-by-descent-based inferences of recent demography and population structure in Plasmodium falciparum. Nat Commun 2024; 15:2499. [PMID: 38509066 PMCID: PMC10954658 DOI: 10.1038/s41467-024-46659-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
Malaria genomic surveillance often estimates parasite genetic relatedness using metrics such as Identity-By-Decent (IBD), yet strong positive selection stemming from antimalarial drug resistance or other interventions may bias IBD-based estimates. In this study, we use simulations, a true IBD inference algorithm, and empirical data sets from different malaria transmission settings to investigate the extent of this bias and explore potential correction strategies. We analyze whole genome sequence data generated from 640 new and 3089 publicly available Plasmodium falciparum clinical isolates. We demonstrate that positive selection distorts IBD distributions, leading to underestimated effective population size and blurred population structure. Additionally, we discover that the removal of IBD peak regions partially restores the accuracy of IBD-based inferences, with this effect contingent on the population's background genetic relatedness and extent of inbreeding. Consequently, we advocate for selection correction for parasite populations undergoing strong, recent positive selection, particularly in high malaria transmission settings.
Collapse
Affiliation(s)
- Bing Guo
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Victor Borda
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Roland Laboulaye
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Michele D Spring
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Mariusz Wojnarski
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Brian A Vesely
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Joana C Silva
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, USA
- Global Health and Tropical Medicine (GHTM), Instituto de Higiene e Medicina Tropical (IHMT), Universidade NOVA de Lisboa (NOVA), Lisbon, Portugal
| | - Norman C Waters
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.
| | - Shannon Takala-Harrison
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
19
|
Kerdoncuff E, Skov L, Patterson N, Zhao W, Lueng YY, Schellenberg GD, Smith JA, Dey S, Ganna A, Dey AB, Kardia SL, Lee J, Moorjani P. 50,000 years of Evolutionary History of India: Insights from ~2,700 Whole Genome Sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.15.580575. [PMID: 38405782 PMCID: PMC10888882 DOI: 10.1101/2024.02.15.580575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
India has been underrepresented in whole genome sequencing studies. We generated 2,762 high coverage genomes from India-including individuals from most geographic regions, speakers of all major languages, and tribal and caste groups-providing a comprehensive survey of genetic variation in India. With these data, we reconstruct the evolutionary history of India through space and time at fine scales. We show that most Indians derive ancestry from three ancestral groups related to ancient Iranian farmers, Eurasian Steppe pastoralists and South Asian hunter-gatherers. We uncover a common source of Iranian-related ancestry from early Neolithic cultures of Central Asia into the ancestors of Ancestral South Indians (ASI), Ancestral North Indians (ANI), Austro-asiatic-related and East Asian-related groups in India. Following these admixtures, India experienced a major demographic shift towards endogamy, resulting in extensive homozygosity and identity-by-descent sharing among individuals. At deep time scales, Indians derive around 1-2% of their ancestry from gene flow from archaic hominins, Neanderthals and Denisovans. By assembling the surviving fragments of archaic ancestry in modern Indians, we recover ~1.5 Gb (or 50%) of the introgressing Neanderthal and ~0.6 Gb (or 20%) of the introgressing Denisovan genomes, more than any other previous archaic ancestry study. Moreover, Indians have the largest variation in Neanderthal ancestry, as well as the highest amount of population-specific Neanderthal segments among worldwide groups. Finally, we demonstrate that most of the genetic variation in Indians stems from a single major migration out of Africa that occurred around 50,000 years ago, with minimal contribution from earlier migration waves. Together, these analyses provide a detailed view of the population history of India and underscore the value of expanding genomic surveys to diverse groups outside Europe.
Collapse
Affiliation(s)
- Elise Kerdoncuff
- Department of Molecular and Cell Biology, University of California, Berkeley, United States of America
| | - Laurits Skov
- Department of Molecular and Cell Biology, University of California, Berkeley, United States of America
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Wei Zhao
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yuk Yee Lueng
- Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, United States of America
| | - Gerard D. Schellenberg
- Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, United States of America
| | - Jennifer A. Smith
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sharmistha Dey
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Andrea Ganna
- Institute for Molecular Medicine Finland, Helsinki, Finland
| | - AB Dey
- Department of Geriatric Medicine, All India Institute of Medical Sciences, New Delhi, India
| | - Sharon L.R. Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jinkook Lee
- Department of Economics, and Center for Economic & Social Research, University of Southern California, Los Angeles, California, United States of America
| | - Priya Moorjani
- Department of Molecular and Cell Biology, University of California, Berkeley, United States of America
- Center for Computational Biology, University of California, Berkeley, United States of America
| |
Collapse
|
20
|
Bisschoff M, Smuts I, Dercksen M, Schoonen M, Vorster BC, van der Watt G, Spencer C, Naidu K, Henning F, Meldau S, McFarland R, Taylor RW, Patel K, Fassad MR, Vandrovcova J, Wanders RJA, van der Westhuizen FH. Clinical, biochemical, and genetic spectrum of MADD in a South African cohort: an ICGNMD study. Orphanet J Rare Dis 2024; 19:15. [PMID: 38221620 PMCID: PMC10789041 DOI: 10.1186/s13023-023-03014-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 12/20/2023] [Indexed: 01/16/2024] Open
Abstract
BACKGROUND Multiple acyl-CoA dehydrogenase deficiency (MADD) is an autosomal recessive disorder resulting from pathogenic variants in three distinct genes, with most of the variants occurring in the electron transfer flavoprotein-ubiquinone oxidoreductase gene (ETFDH). Recent evidence of potential founder variants for MADD in the South African (SA) population, initiated this extensive investigation. As part of the International Centre for Genomic Medicine in Neuromuscular Diseases study, we recruited a cohort of patients diagnosed with MADD from academic medical centres across SA over a three-year period. The aim was to extensively profile the clinical, biochemical, and genomic characteristics of MADD in this understudied population. METHODS Clinical evaluations and whole exome sequencing were conducted on each patient. Metabolic profiling was performed before and after treatment, where possible. The recessive inheritance and phase of the variants were established via segregation analyses using Sanger sequencing. Lastly, the haplotype and allele frequencies were determined for the two main variants in the four largest SA populations. RESULTS Twelve unrelated families (ten of White SA and two of mixed ethnicity) with clinically heterogeneous presentations in 14 affected individuals were observed, and five pathogenic ETFDH variants were identified. Based on disease severity and treatment response, three distinct groups emerged. The most severe and fatal presentations were associated with the homozygous c.[1067G > A];c.[1067G > A] and compound heterozygous c.[976G > C];c.[1067G > A] genotypes, causing MADD types I and I/II, respectively. These, along with three less severe compound heterozygous genotypes (c.[1067G > A];c.[1448C > T], c.[740G > T];c.[1448C > T], and c.[287dupA*];c.[1448C > T]), resulting in MADD types II/III, presented before the age of five years, depending on the time and maintenance of intervention. By contrast, the homozygous c.[1448C > T];c.[1448C > T] genotype, which causes MADD type III, presented later in life. Except for the type I, I/II and II cases, urinary metabolic markers for MADD improved/normalised following treatment with riboflavin and L-carnitine. Furthermore, genetic analyses of the most frequent variants (c.[1067G > A] and c.[1448C > T]) revealed a shared haplotype in the region of ETFDH, with SA population-specific allele frequencies of < 0.00067-0.00084%. CONCLUSIONS This study reveals the first extensive genotype-phenotype profile of a MADD patient cohort from the diverse and understudied SA population. The pathogenic variants and associated variable phenotypes were characterised, which will enable early screening, genetic counselling, and patient-specific treatment of MADD in this population.
Collapse
Affiliation(s)
- Michelle Bisschoff
- Focus area for Human Metabolomics, North-West University, Potchefstroom, South Africa
| | - Izelle Smuts
- Department of Paediatrics, Steve Biko Academic Hospital, University of Pretoria, Pretoria, South Africa
| | - Marli Dercksen
- Centre for Human Metabolomics, North-West University, Potchefstroom, South Africa
| | - Maryke Schoonen
- Focus area for Human Metabolomics, North-West University, Potchefstroom, South Africa
| | - Barend C Vorster
- Centre for Human Metabolomics, North-West University, Potchefstroom, South Africa
| | - George van der Watt
- Division of Chemical Pathology, National Health Laboratory Services, University of Cape Town, Cape Town, South Africa
| | - Careni Spencer
- Division of Human Genetics, Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
| | - Kireshnee Naidu
- Division of Neurology, Department of Medicine, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Franclo Henning
- Division of Neurology, Department of Medicine, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Surita Meldau
- Division of Chemical Pathology, National Health Laboratory Services, University of Cape Town, Cape Town, South Africa
| | - Robert McFarland
- Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, Newcastle Upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, NE1 4LP, UK
| | - Robert W Taylor
- Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, Newcastle Upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, NE1 4LP, UK
| | - Krutik Patel
- Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Mahmoud R Fassad
- Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Jana Vandrovcova
- Centre for Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Ronald J A Wanders
- Department of Clinical Chemistry, Laboratory Genetic Metabolic Diseases, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | | |
Collapse
|
21
|
Gao Z. Unveiling recent and ongoing adaptive selection in human populations. PLoS Biol 2024; 22:e3002469. [PMID: 38236800 PMCID: PMC10796035 DOI: 10.1371/journal.pbio.3002469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open
Abstract
Genome-wide scans for signals of selection have become a routine part of the analysis of population genomic variation datasets and have resulted in compelling evidence of selection during recent human evolution. This Essay spotlights methodological innovations that have enabled the detection of selection over very recent timescales, even in contemporary human populations. By harnessing large-scale genomic and phenotypic datasets, these new methods use different strategies to uncover connections between genotype, phenotype, and fitness. This Essay outlines the rationale and key findings of each strategy, discusses challenges in interpretation, and describes opportunities to improve detection and understanding of ongoing selection in human populations.
Collapse
Affiliation(s)
- Ziyue Gao
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
22
|
Gagnon L, Moreau C, Laprise C, Vézina H, Girard SL. Deciphering the genetic structure of the Quebec founder population using genealogies. Eur J Hum Genet 2024; 32:91-97. [PMID: 37016017 PMCID: PMC10772069 DOI: 10.1038/s41431-023-01356-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 03/07/2023] [Accepted: 03/22/2023] [Indexed: 04/06/2023] Open
Abstract
Using genealogy to study the demographic history of a population makes it possible to overcome the models and assumptions often used in population genetics. The Quebec founder population is one of the few populations in the world having access to the complete genealogy of the last 400 years. The goal of this study is to follow the evolution of the Quebec population structure over time from the beginning of European colonization until the present day. To do so, we calculated the kinship coefficients of all ancestors' pairs in the ascending genealogy of 665 subjects from eight regional and ethnocultural groups per 25-year period. We show that the Quebec population structure appeared progressively in the St. Lawrence valley as early as 1750 with the distinction of the Saguenay and Gaspesian groups. At that time, the ancestors of two groups, the Sagueneans and the Acadians from the Gaspé Peninsula, experienced a marked increase in kinship and inbreeding levels which have shaped the structure and led to the contemporary population structure. Interestingly, this structure arose before the colonization of the Saguenay region and at the very beginning of the Gaspé Peninsula settlement. The resulting regional founder effects in these groups led to differences in the present-day identity-by-descent sharing, the Gaspé and North Shore groups sharing more large segments and the Sagueneans more short segments. This is also reflected by the distribution of the number of most recent common ancestors at different generations and their genetic contribution to the studied subjects.
Collapse
Affiliation(s)
- Laurence Gagnon
- Département des Sciences Fondamentales, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Centre Intersectoriel en Santé Durable (CISD), Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
| | - Claudia Moreau
- Département des Sciences Fondamentales, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Centre Intersectoriel en Santé Durable (CISD), Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
| | - Catherine Laprise
- Département des Sciences Fondamentales, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Centre Intersectoriel en Santé Durable (CISD), Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Centre Intégré Universitaire en Santé et Services Sociaux du Saguenay-Lac-Saint-Jean, Saguenay, Québec, G7H 7K9, Canada
| | - Hélène Vézina
- Centre Intersectoriel en Santé Durable (CISD), Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Département des Sciences Humaines et Sociales, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Projet BALSAC, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
| | - Simon L Girard
- Département des Sciences Fondamentales, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada.
- Centre Intersectoriel en Santé Durable (CISD), Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada.
- Centre de Recherche CERVO, Université Laval, Québec, Québec, G1V 0A6, Canada.
| |
Collapse
|
23
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Olalde I, Patterson N, Reich D. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet 2024; 56:143-151. [PMID: 38123640 PMCID: PMC10786714 DOI: 10.1038/s41588-023-01582-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 10/20/2023] [Indexed: 12/23/2023]
Abstract
Long DNA segments shared between two individuals, known as identity-by-descent (IBD), reveal recent genealogical connections. Here we introduce ancIBD, a method for identifying IBD segments in ancient human DNA (aDNA) using a hidden Markov model and imputed genotype probabilities. We demonstrate that ancIBD accurately identifies IBD segments >8 cM for aDNA data with an average depth of >0.25× for whole-genome sequencing or >1× for 1240k single nucleotide polymorphism capture data. Applying ancIBD to 4,248 ancient Eurasian individuals, we identify relatives up to the sixth degree and genealogical connections between archaeological groups. Notably, we reveal long IBD sharing between Corded Ware and Yamnaya groups, indicating that the Yamnaya herders of the Pontic-Caspian Steppe and the Steppe-related ancestry in various European Corded Ware groups share substantial co-ancestry within only a few hundred years. These results show that detecting IBD segments can generate powerful insights into the growing aDNA record, both on a small scale relevant to life stories and on a large scale relevant to major cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germany
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Iñigo Olalde
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- BIOMICs Research Group, University of the Basque Country, Vitoria-Gasteiz, Spain
- Ikerbasque-Basque Foundation of Science, Bilbao, Spain
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
24
|
Mosca MJ, Cho H. Reconstruction of private genomes through reference-based genotype imputation. Genome Biol 2023; 24:271. [PMID: 38053191 PMCID: PMC10698978 DOI: 10.1186/s13059-023-03105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 11/06/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Genotype imputation is an essential step in genetic studies to improve data quality and statistical power. Public imputation servers are widely used by researchers to impute their data using otherwise access-controlled reference panels of high-fidelity genomes held by these servers. RESULTS We report evidence against the prevailing assumption that providing access to panels only indirectly via imputation servers poses a negligible privacy risk to individuals in the panels. To this end, we present algorithmic strategies for adaptively constructing artificial input samples and interpreting their imputation results that lead to the accurate reconstruction of reference panel haplotypes. We illustrate this possibility on three reference panels of real genomes for a range of imputation tools and output settings. Moreover, we demonstrate that reconstructed haplotypes from the same individual could be linked via their genetic relatives using our Bayesian linking algorithm, which allows a substantial portion of the individual's diploid genome to be reassembled. We also provide population genetic estimates of the proportion of a panel that could be linked when an adversary holds a varying number of genomes from the same population. CONCLUSIONS Our results show that genomes in imputation server reference panels can be vulnerable to reconstruction, implying that additional safeguards may need to be considered. We suggest possible mitigation measures based on our findings. Our work illustrates the value of adversarial algorithms in uncovering new privacy risks to help inform the genomics community towards secure data sharing practices.
Collapse
Affiliation(s)
| | - Hyunghoon Cho
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
25
|
Chen H, Naseri A, Zhi D. FiMAP: A fast identity-by-descent mapping test for biobank-scale cohorts. PLoS Genet 2023; 19:e1011057. [PMID: 38039339 PMCID: PMC10718418 DOI: 10.1371/journal.pgen.1011057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 12/13/2023] [Accepted: 11/07/2023] [Indexed: 12/03/2023] Open
Abstract
Although genome-wide association studies (GWAS) have identified tens of thousands of genetic loci, the genetic architecture is still not fully understood for many complex traits. Most GWAS and sequencing association studies have focused on single nucleotide polymorphisms or copy number variations, including common and rare genetic variants. However, phased haplotype information is often ignored in GWAS or variant set tests for rare variants. Here we leverage the identity-by-descent (IBD) segments inferred from a random projection-based IBD detection algorithm in the mapping of genetic associations with complex traits, to develop a computationally efficient statistical test for IBD mapping in biobank-scale cohorts. We used sparse linear algebra and random matrix algorithms to speed up the computation, and a genome-wide IBD mapping scan of more than 400,000 samples finished within a few hours. Simulation studies showed that our new method had well-controlled type I error rates under the null hypothesis of no genetic association in large biobank-scale cohorts, and outperformed traditional GWAS single-variant tests when the causal variants were untyped and rare, or in the presence of haplotype effects. We also applied our method to IBD mapping of six anthropometric traits using the UK Biobank data and identified a total of 3,442 associations, 2,131 (62%) of which remained significant after conditioning on suggestive tag variants in the ± 3 centimorgan flanking regions from GWAS.
Collapse
Affiliation(s)
- Han Chen
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Ardalan Naseri
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Degui Zhi
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
26
|
Fournier R, Tsangalidou Z, Reich D, Palamara PF. Haplotype-based inference of recent effective population size in modern and ancient DNA samples. Nat Commun 2023; 14:7945. [PMID: 38040695 PMCID: PMC10692198 DOI: 10.1038/s41467-023-43522-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 11/10/2023] [Indexed: 12/03/2023] Open
Abstract
Individuals sharing recent ancestors are likely to co-inherit large identical-by-descent (IBD) genomic regions. The distribution of these IBD segments in a population may be used to reconstruct past demographic events such as effective population size variation, but accurate IBD detection is difficult in ancient DNA data and in underrepresented populations with limited reference data. In this work, we introduce an accurate method for inferring effective population size variation during the past ~2000 years in both modern and ancient DNA data, called HapNe. HapNe infers recent population size fluctuations using either IBD sharing (HapNe-IBD) or linkage disequilibrium (HapNe-LD), which does not require phasing and can be computed in low coverage data, including data sets with heterogeneous sampling times. HapNe shows improved accuracy in a range of simulated demographic scenarios compared to currently available methods for IBD-based and LD-based inference of recent effective population size, while requiring fewer computational resources. We apply HapNe to several modern populations from the 1,000 Genomes Project, the UK Biobank, the Allen Ancient DNA Resource, and recently published samples from Iron Age Britain, detecting multiple instances of recent effective population size variation across these groups.
Collapse
Affiliation(s)
| | | | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
27
|
Liu X, Matsunami M, Horikoshi M, Ito S, Ishikawa Y, Suzuki K, Momozawa Y, Niida S, Kimura R, Ozaki K, Maeda S, Imamura M, Terao C. Natural Selection Signatures in the Hondo and Ryukyu Japanese Subpopulations. Mol Biol Evol 2023; 40:msad231. [PMID: 37903429 PMCID: PMC10615566 DOI: 10.1093/molbev/msad231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 09/20/2023] [Accepted: 10/06/2023] [Indexed: 11/01/2023] Open
Abstract
Natural selection signatures across Japanese subpopulations are under-explored. Here we conducted genome-wide selection scans with 622,926 single nucleotide polymorphisms for 20,366 Japanese individuals, who were recruited from the main-islands of Japanese Archipelago (Hondo) and the Ryukyu Archipelago (Ryukyu), representing two major Japanese subpopulations. The integrated haplotype score (iHS) analysis identified several signals in one or both subpopulations. We found a novel candidate locus at IKZF2, especially in Ryukyu. Significant signals were observed in the major histocompatibility complex region in both subpopulations. The lead variants differed and demonstrated substantial allele frequency differences between Hondo and Ryukyu. The lead variant in Hondo tags HLA-A*33:03-C*14:03-B*44:03-DRB1*13:02-DQB1*06:04-DPB1*04:01, a haplotype specific to Japanese and Korean. While in Ryukyu, the lead variant tags DRB1*15:01-DQB1*06:02, which had been recognized as a genetic risk factor for narcolepsy. In contrast, it is reported to confer protective effects against type 1 diabetes and human T lymphotropic virus type 1-associated myelopathy/tropical spastic paraparesis. The FastSMC analysis identified 8 loci potentially affected by selection within the past 20-150 generations, including 2 novel candidate loci. The analysis also showed differences in selection patterns of ALDH2 between Hondo and Ryukyu, a gene recognized to be specifically targeted by selection in East Asian. In summary, our study provided insights into the selection signatures within the Japanese and nominated potential sources of selection pressure.
Collapse
Affiliation(s)
- Xiaoxi Liu
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Masatoshi Matsunami
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Momoko Horikoshi
- Laboratory for Genomics of Diabetes and Metabolism, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Shuji Ito
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yuki Ishikawa
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Kunihiko Suzuki
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Shumpei Niida
- Core Facility Administration, Research Institute, National Center for Geriatrics and Gerontology, Obu, Japan
| | - Ryosuke Kimura
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Kouichi Ozaki
- Medical Genome Center, Research Institute, National Center for Geriatrics and Gerontology, Obu, Japan
| | - Shiro Maeda
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
- Division of Clinical Laboratory and Blood Transfusion, University of the Ryukyus Hospital, Okinawa, Japan
| | - Minako Imamura
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
- Division of Clinical Laboratory and Blood Transfusion, University of the Ryukyus Hospital, Okinawa, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
28
|
Nait Saada J, Tsangalidou Z, Stricker M, Palamara PF. Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks. Mol Biol Evol 2023; 40:msad211. [PMID: 37738175 PMCID: PMC10581698 DOI: 10.1093/molbev/msad211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/11/2023] [Accepted: 09/18/2023] [Indexed: 09/24/2023] Open
Abstract
Accurate inference of the time to the most recent common ancestor (TMRCA) between pairs of individuals and of the age of genomic variants is key in several population genetic analyses. We developed a likelihood-free approach, called CoalNN, which uses a convolutional neural network to predict pairwise TMRCAs and allele ages from sequencing or SNP array data. CoalNN is trained through simulation and can be adapted to varying parameters, such as demographic history, using transfer learning. Across several simulated scenarios, CoalNN matched or outperformed the accuracy of model-based approaches for pairwise TMRCA and allele age prediction. We applied CoalNN to settings for which model-based approaches are under-developed and performed analyses to gain insights into the set of features it uses to perform TMRCA prediction. We next used CoalNN to analyze 2,504 samples from 26 populations in the 1,000 Genome Project data set, inferring the age of ∼80 million variants. We observed substantial variation across populations and for variants predicted to be pathogenic, reflecting heterogeneous demographic histories and the action of negative selection. We used CoalNN's predicted allele ages to construct genome-wide annotations capturing the signature of past negative selection. We performed LD-score regression analysis of heritability using summary association statistics from 63 independent complex traits and diseases (average N=314k), observing increased annotation-specific effects on heritability compared to a previous allele age annotation. These results highlight the effectiveness of using likelihood-free, simulation-trained models to infer properties of gene genealogies in large genomic data sets.
Collapse
Affiliation(s)
| | | | | | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
29
|
Lancaster MC, Chen HH, Shoemaker MB, Fleming MR, Baker JT, Evans G, Polikowsky HG, Samuels DC, Huff CD, Roden DM, Below JE. Detection of distant relatedness in biobanks for identification of undiagnosed carriers of a Mendelian disease variant: application to Long QT Syndrome. RESEARCH SQUARE 2023:rs.3.rs-3314860. [PMID: 37790303 PMCID: PMC10543295 DOI: 10.21203/rs.3.rs-3314860/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Rare genetic diseases are typically studied in referral populations, resulting in underdiagnosis and biased assessment of penetrance and phenotype. To address this, we developed a generalizable method of genotype inference based on distant relatedness and deployed this to identify undiagnosed Type 5 Long QT Syndrome (LQT5) rare variant carriers in a non-referral population. We identified 9 LQT5 families referred to a single specialty clinic, each carrying p.Asp76Asn, the most common LQT5 variant. We uncovered recent common ancestry and a single shared haplotype among probands. Application to a non-referral population of 69,819 BioVU biobank subjects identified 22 additional subjects sharing this haplotype, subsequently confirmed to carry p.Asp76Asn. Referral and non-referral carriers had prolonged QTc compared to controls, and, among carriers, QTc polygenic score additively associated with QTc prolongation. Thus, our novel analysis of shared chromosomal segments identified undiagnosed cases of genetic disease and refined the understanding of LQT5 penetrance and phenotype.
Collapse
Affiliation(s)
- Megan C Lancaster
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - Hung-Hsin Chen
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - M Benjamin Shoemaker
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - Matthew R Fleming
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - James T Baker
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - Grahame Evans
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - Hannah G Polikowsky
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, 37232, U.S.A
| | - Chad D Huff
- Division of Cancer Prevention and Population Sciences, Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, U.S.A
| | - Dan M Roden
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - Jennifer E Below
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| |
Collapse
|
30
|
Guo B, Borda V, Laboulaye R, Spring MD, Wojnarski M, Vesely BA, Silva JC, Waters NC, O'Connor TD, Takala-Harrison S. Strong Positive Selection Biases Identity-By-Descent-Based Inferences of Recent Demography and Population Structure in Plasmodium falciparum. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.14.549114. [PMID: 37502843 PMCID: PMC10370022 DOI: 10.1101/2023.07.14.549114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Malaria genomic surveillance often estimates parasite genetic relatedness using metrics such as Identity-By-Decent (IBD). Yet, strong positive selection stemming from antimalarial drug resistance or other interventions may bias IBD-based estimates. In this study, we utilized simulations, a true IBD inference algorithm, and empirical datasets from different malaria transmission settings to investigate the extent of such bias and explore potential correction strategies. We analyzed whole genome sequence data generated from 640 new and 4,026 publicly available Plasmodium falciparum clinical isolates. Our findings demonstrated that positive selection distorts IBD distributions, leading to underestimated effective population size and blurred population structure. Additionally, we discovered that the removal of IBD peak regions partially restored the accuracy of IBD-based inferences, with this effect contingent on the population's background genetic relatedness. Consequently, we advocate for selection correction for parasite populations undergoing strong, recent positive selection, particularly in high malaria transmission settings.
Collapse
Affiliation(s)
- Bing Guo
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD USA
| | - Victor Borda
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Roland Laboulaye
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Michele D Spring
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Mariusz Wojnarski
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Brian A Vesely
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Joana C Silva
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Norman C Waters
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Shannon Takala-Harrison
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD USA
| |
Collapse
|
31
|
Caggiano C, Boudaie A, Shemirani R, Mefford J, Petter E, Chiu A, Ercelen D, He R, Tward D, Paul KC, Chang TS, Pasaniuc B, Kenny EE, Shortt JA, Gignoux CR, Balliu B, Arboleda VA, Belbin G, Zaitlen N. Disease risk and healthcare utilization among ancestrally diverse groups in the Los Angeles region. Nat Med 2023; 29:1845-1856. [PMID: 37464048 PMCID: PMC11121511 DOI: 10.1038/s41591-023-02425-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 05/30/2023] [Indexed: 07/20/2023]
Abstract
An individual's disease risk is affected by the populations that they belong to, due to shared genetics and environmental factors. The study of fine-scale populations in clinical care is important for identifying and reducing health disparities and for developing personalized interventions. To assess patterns of clinical diagnoses and healthcare utilization by fine-scale populations, we leveraged genetic data and electronic medical records from 35,968 patients as part of the UCLA ATLAS Community Health Initiative. We defined clusters of individuals using identity by descent, a form of genetic relatedness that utilizes shared genomic segments arising due to a common ancestor. In total, we identified 376 clusters, including clusters with patients of Afro-Caribbean, Puerto Rican, Lebanese Christian, Iranian Jewish and Gujarati ancestry. Our analysis uncovered 1,218 significant associations between disease diagnoses and clusters and 124 significant associations with specialty visits. We also examined the distribution of pathogenic alleles and found 189 significant alleles at elevated frequency in particular clusters, including many that are not regularly included in population screening efforts. Overall, this work progresses the understanding of health in understudied communities and can provide the foundation for further study into health inequities.
Collapse
Affiliation(s)
- Christa Caggiano
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Ruhollah Shemirani
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joel Mefford
- Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ella Petter
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Alec Chiu
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Defne Ercelen
- Computational and Systems Biology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Rosemary He
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Daniel Tward
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kimberly C Paul
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Timothy S Chang
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jonathan A Shortt
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Division of Bioinformatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Division of Bioinformatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Brunilda Balliu
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Valerie A Arboleda
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Gillian Belbin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Noah Zaitlen
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
32
|
Schweiger R, Durbin R. Ultrafast genome-wide inference of pairwise coalescence times. Genome Res 2023; 33:1023-1031. [PMID: 37562965 PMCID: PMC10538485 DOI: 10.1101/gr.277665.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/21/2023] [Indexed: 08/12/2023]
Abstract
The pairwise sequentially Markovian coalescent (PSMC) algorithm and its extensions infer the coalescence time of two homologous chromosomes at each genomic position. This inference is used in reconstructing demographic histories, detecting selection signatures, studying genome-wide associations, constructing ancestral recombination graphs, and more. Inference of coalescence times between each pair of haplotypes in a large data set is of great interest, as they may provide rich information about the population structure and history of the sample. Here, we introduce a new method, Gamma-SMC, which is more than 10 times faster than current methods. To obtain this speed-up, we represent the posterior coalescence time distributions succinctly as a gamma distribution with just two parameters; in contrast, PSMC and its extensions hold these in a vector over discrete intervals of time. Thus, Gamma-SMC has constant time-complexity per site, without dependence on the number of discrete time states. Additionally, because of this continuous representation, our method is able to infer times spanning many orders of magnitude and, as such, is robust to parameter misspecification. We describe how this approach works, show its performance on simulated and real data, and illustrate its use in studying recent positive selection in the 1000 Genomes Project data set.
Collapse
Affiliation(s)
- Regev Schweiger
- Department of Genetics, University of Cambridge, Cambridge CB2 1TN, United Kingdom
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge CB2 1TN, United Kingdom
| |
Collapse
|
33
|
Wei Y, Naseri A, Zhi D, Zhang S. RaPID-Query for fast identity by descent search and genealogical analysis. Bioinformatics 2023; 39:btad312. [PMID: 37166451 PMCID: PMC10244210 DOI: 10.1093/bioinformatics/btad312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 04/26/2023] [Accepted: 05/09/2023] [Indexed: 05/12/2023] Open
Abstract
MOTIVATION Due to the rapid growth of the genetic database size, genealogical search, a process of inferring familial relatedness by identifying DNA matches, has become a viable approach to help individuals finding missing family members or law enforcement agencies locating suspects. A fast and accurate method is needed to search an out-of-database individual against millions of individuals. Most existing approaches only offer all-versus-all within panel match. Some prototype algorithms offer one-versus-all query from out-of-panel individual, but they do not tolerate errors. RESULTS A new method, random projection-based identity-by-descent (IBD) detection (RaPID) query, is introduced to make fast genealogical search possible. RaPID-Query identifies IBD segments between a query haplotype and a panel of haplotypes. By integrating matches over multiple PBWT indexes, RaPID-Query manages to locate IBD segments quickly with a given cutoff length while allowing mismatched sites. A single query against all UK biobank autosomal chromosomes was completed within 2.76 seconds on average, with the minimum length 7 cM and 700 markers. RaPID-Query achieved a 0.016 false negative rate and a 0.012 false positive rate simultaneously on a chromosome 20 sequencing panel having 86 265 sites. This is comparable to the state-of-the-art IBD detection method TPBWT(out-of-sample) and Hap-IBD. The high-quality IBD segments yielded by RaPID-Query were able to distinguish up to fourth degree of the familial relatedness for a given individual pair, and the area under the receiver operating characteristic curve values are at least 97.28%. AVAILABILITY AND IMPLEMENTATION The RaPID-Query program is available at https://github.com/ucfcbb/RaPID-Query.
Collapse
Affiliation(s)
- Yuan Wei
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Ardalan Naseri
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Degui Zhi
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
34
|
Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, Palamara PF. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat Genet 2023; 55:768-776. [PMID: 37127670 PMCID: PMC10181934 DOI: 10.1038/s41588-023-01379-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 03/22/2023] [Indexed: 05/03/2023]
Abstract
Genome-wide genealogies compactly represent the evolutionary history of a set of genomes and inferring them from genetic data has the potential to facilitate a wide range of analyses. We introduce a method, ARG-Needle, for accurately inferring biobank-scale genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies to perform association and other complex trait analyses. We use these methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and test for association across seven complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 134, frequency range 0.0007-0.1%) than genotype imputation using ~65,000 sequenced haplotypes (N = 64). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants enriched (4.8×) for loss-of-function variation. These results demonstrate that inferred genome-wide genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.
Collapse
Affiliation(s)
- Brian C Zhang
- Department of Statistics, University of Oxford, Oxford, UK
| | - Arjun Biddanda
- Department of Statistics, University of Oxford, Oxford, UK
| | - Árni Freyr Gunnarsson
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Fergus Cooper
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
35
|
Lancaster MC, Chen HH, Shoemaker MB, Fleming MR, Baker JT, Polikowsky HG, Samuels DC, Huff CD, Roden DM, Below JE. Detection of distant familial relatedness in biobanks for identification of undiagnosed carriers of a Mendelian disease variant: application to Long QT syndrome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.04.19.23288831. [PMID: 37163006 PMCID: PMC10168417 DOI: 10.1101/2023.04.19.23288831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Importance The diagnosis and study of rare genetic disease is often limited to referral populations, leading to underdiagnosis and a biased assessment of penetrance and phenotype. Objective To develop a generalizable method of genotype inference based on distant relatedness and to deploy this to identify undiagnosed Type 5 Long QT Syndrome (LQT5) rare variant carriers in a non-referral population. Participants We identified 9 LQT5 probands and 3 first-degree relatives referred to a single Genetic Arrhythmia clinic, each carrying D76N (p.Asp76Asn), the most common variant implicated in LQT5. The non-referral population consisted of 69,879 ancestry-matched subjects in BioVU, a large biobank that links electronic health records to dense array data. Participants were enrolled from 2007-2022. Data analysis was performed in 2022. Exposures We developed and applied a novel approach to genotype inference (Distant Relatedness for Identification and Variant Evaluation, or DRIVE) to identify shared, identical-by-descent (IBD) large chromosomal segments in array data. Main Outcomes and Measures We sought to establish genetic relatedness among the probands and to use genomic segments underlying D76N to identify other potential carriers in BioVU. We then further studied the role of D76N in LQT5 pathogenesis. Results Genetic reconstruction of pedigrees and distant relatedness detection among clinic probands using DRIVE revealed shared recent common ancestry and identified a single long shared haplotype. Interrogation of the non-referral population in BioVU identified a further 23 subjects sharing this haplotype, and sequencing confirmed D76N carrier status in 22, all previously undiagnosed with LQT5. The QTc was prolonged in D76N carriers compared to BioVU controls, with 40% penetrance of QTc ≥ 480 msec. Among D76N carriers, a QTc polygenic score was additively associated with QTc prolongation. Conclusions and Relevance Detection of IBD shared chromosomal segments around D76N enabled identification of distantly related and previously undiagnosed rare-variant carriers, demonstrated the contribution of polygenic risk to monogenic disease penetrance, and further established LQT5 as a primary arrhythmia disorder. Analysis of shared chromosomal regions spanning disease-causing mutations can identify undiagnosed cases of genetic diseases.
Collapse
Affiliation(s)
| | - Hung-Hsin Chen
- Vanderbilt University Medical Center, Nashville, Tennessee
| | | | | | - James T Baker
- Vanderbilt University Medical Center, Nashville, Tennessee
| | | | | | - Chad D Huff
- University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Dan M Roden
- Vanderbilt University Medical Center, Nashville, Tennessee
| | | |
Collapse
|
36
|
Barry CJS, Walker VM, Cheesman R, Davey Smith G, Morris TT, Davies NM. How to estimate heritability: a guide for genetic epidemiologists. Int J Epidemiol 2023; 52:624-632. [PMID: 36427280 PMCID: PMC10114051 DOI: 10.1093/ije/dyac224] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 11/14/2022] [Indexed: 11/27/2022] Open
Abstract
Traditionally, heritability has been estimated using family-based methods such as twin studies. Advancements in molecular genomics have facilitated the development of methods that use large samples of (unrelated or related) genotyped individuals. Here, we provide an overview of common methods applied in genetic epidemiology to estimate heritability, i.e. the proportion of phenotypic variation explained by genetic variation. We provide a guide to key genetic concepts required to understand heritability estimation methods from family-based designs (twin and family studies), genomic designs based on unrelated individuals [linkage disequilibrium score regression, genomic relatedness restricted maximum-likelihood (GREML) estimation] and family-based genomic designs (sibling regression, GREML-kinship, trio-genome-wide complex trait analysis, maternal-genome-wide complex trait analysis, relatedness disequilibrium regression). We describe how heritability is estimated for each method and the assumptions underlying its estimation, and discuss the implications when these assumptions are not met. We further discuss the benefits and limitations of estimating heritability within samples of unrelated individuals compared with samples of related individuals. Overall, this article is intended to help the reader determine the circumstances when each method would be appropriate and why.
Collapse
Affiliation(s)
- Ciarrah-Jane S Barry
- Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Venexia M Walker
- Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, USA
| | - Rosa Cheesman
- PROMENTA Research Center, Department of Psychology, University of Oslo, Oslo, Norway
| | - George Davey Smith
- Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Tim T Morris
- Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Neil M Davies
- Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
37
|
Vilgalys TP, Klunk J, Demeure CE, Cheng X, Shiratori M, Madej J, Beau R, Elli D, Patino MI, Redfern R, DeWitte SN, Gamble JA, Boldsen JL, Carmichael A, Varlik N, Eaton K, Grenier JC, Golding GB, Devault A, Rouillard JM, Yotova V, Sindeaux R, Ye CJ, Bikaran M, Dumaine A, Brinkworth JF, Missiakas D, Rouleau GA, Steinrücken M, Pizarro-Cerdá J, Poinar HN, Barreiro LB. Reply to Barton et al: signatures of natural selection during the Black Death. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.06.535944. [PMID: 37066254 PMCID: PMC10104142 DOI: 10.1101/2023.04.06.535944] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Barton et al.1 raise several statistical concerns regarding our original analyses2 that highlight the challenge of inferring natural selection using ancient genomic data. We show here that these concerns have limited impact on our original conclusions. Specifically, we recover the same signature of enrichment for high FST values at the immune loci relative to putatively neutral sites after switching the allele frequency estimation method to a maximum likelihood approach, filtering to only consider known human variants, and down-sampling our data to the same mean coverage across sites. Furthermore, using permutations, we show that the rs2549794 variant near ERAP2 continues to emerge as the strongest candidate for selection (p = 1.2×10-5), falling below the Bonferroni-corrected significance threshold recommended by Barton et al. Importantly, the evidence for selection on ERAP2 is further supported by functional data demonstrating the impact of the ERAP2 genotype on the immune response to Y. pestis and by epidemiological data from an independent group showing that the putatively selected allele during the Black Death protects against severe respiratory infection in contemporary populations.
Collapse
Affiliation(s)
- Tauras P Vilgalys
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jennifer Klunk
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada L8S4L9
- Daicel Arbor Biosciences, Ann Arbor, MI, USA
| | - Christian E Demeure
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, F-75015 Paris, France
| | - Xiaoheng Cheng
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Mari Shiratori
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Julien Madej
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, F-75015 Paris, France
| | - Rémi Beau
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, F-75015 Paris, France
| | - Derek Elli
- Department of Microbiology, Ricketts Laboratory, University of Chicago, Lemont, IL, USA
| | - Maria I Patino
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Rebecca Redfern
- Centre for Human Bioarchaeology, Museum of London, London, UK, EC2Y 5HN
| | - Sharon N DeWitte
- Department of Anthropology, University of South Carolina, Columbia, SC, USA
| | - Julia A Gamble
- Department of Anthropology, University of Manitoba, Winnipeg, Manitoba, R3T2N2
| | - Jesper L Boldsen
- Department of Forensic Medicine, Unit of Anthropology (ADBOU), University of Southern Denmark, Odense S, 5260, Denmark
| | - Ann Carmichael
- History Department, Indiana University, Bloomington, IN, USA
| | - Nükhet Varlik
- Department of History, Rutgers University-Newark, NJ, USA
| | - Katherine Eaton
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada L8S4L9
| | - Jean-Christophe Grenier
- Montreal Heart Institute, Faculty of Medicine, Université de Montréal, Montréal, Quebec, Canada, H1T 1C7
| | - G Brian Golding
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada L8S4L9
| | | | - Jean-Marie Rouillard
- Daicel Arbor Biosciences, Ann Arbor, MI, USA
- Department of Chemical Engineering, University of Michigan Ann Arbor, Ann Arbor, MI, USA
| | - Vania Yotova
- Centre Hospitalier Universitaire Sainte-Justine, Montréal, Quebec, Canada, H3T 1C5
| | - Renata Sindeaux
- Centre Hospitalier Universitaire Sainte-Justine, Montréal, Quebec, Canada, H3T 1C5
| | - Chun Jimmie Ye
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Matin Bikaran
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Anne Dumaine
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jessica F Brinkworth
- Department of Anthropology, University of Illinois Urbana-Champaign, Urbana, IL, USA
- Carl R Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Dominique Missiakas
- Department of Microbiology, Ricketts Laboratory, University of Chicago, Lemont, IL, USA
| | - Guy A Rouleau
- Montreal Neurological Institute-Hospital, McGill University, Montréal, Quebec, Canada, H3A 2B4
| | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Javier Pizarro-Cerdá
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, F-75015 Paris, France
| | - Hendrik N Poinar
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada L8S4L9
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
- Department of Microbiology, Ricketts Laboratory, University of Chicago, Lemont, IL, USA
| | - Luis B Barreiro
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
- Centre for Human Bioarchaeology, Museum of London, London, UK, EC2Y 5HN
- Department of Anthropology, University of South Carolina, Columbia, SC, USA
- Department of Anthropology, University of Manitoba, Winnipeg, Manitoba, R3T2N2
| |
Collapse
|
38
|
Barton AR, Santander CG, Skoglund P, Moltke I, Reich D, Mathieson I. Insufficient evidence for natural selection associated with the Black Death. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.532615. [PMID: 36993413 PMCID: PMC10055098 DOI: 10.1101/2023.03.14.532615] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Klunk et al. analyzed ancient DNA data from individuals in London and Denmark before, during and after the Black Death [1], and argued that allele frequency changes at immune genes were too large to be produced by random genetic drift and thus must reflect natural selection. They also identified four specific variants that they claimed show evidence of selection including at ERAP2, for which they estimate a selection coefficient of 0.39-several times larger than any selection coefficient on a common human variant reported to date. Here we show that these claims are unsupported for four reasons. First, the signal of enrichment of large allele frequency changes in immune genes comparing people in London before and after the Black Death disappears after an appropriate randomization test is carried out: the P value increases by ten orders of magnitude and is no longer significant. Second, a technical error in the estimation of allele frequencies means that none of the four originally reported loci actually pass the filtering thresholds. Third, the filtering thresholds do not adequately correct for multiple testing. Finally, in the case of the ERAP2 variant rs2549794, which Klunk et al. show experimentally may be associated with a host interaction with Y. pestis, we find no evidence of significant frequency change either in the data that Klunk et al. report, or in published data spanning 2,000 years. While it remains plausible that immune genes were subject to natural selection during the Black Death, the magnitude of this selection and which specific genes may have been affected remains unknown.
Collapse
Affiliation(s)
- Alison R. Barton
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Cindy G. Santander
- Department of Biology, University of Copenhagen, Copenhagen, DK-2200, Denmark
| | - Pontus Skoglund
- Ancient Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - Ida Moltke
- Department of Biology, University of Copenhagen, Copenhagen, DK-2200, Denmark
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia PA 19104, USA
| |
Collapse
|
39
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Patterson N, Reich D. ancIBD - Screening for identity by descent segments in human ancient DNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531671. [PMID: 36945531 PMCID: PMC10028887 DOI: 10.1101/2023.03.08.531671] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Long DNA sequences shared between two individuals, known as Identical by descent (IBD) segments, are a powerful signal for identifying close and distant biological relatives because they only arise when the pair shares a recent common ancestor. Existing methods to call IBD segments between present-day genomes cannot be straightforwardly applied to ancient DNA data (aDNA) due to typically low coverage and high genotyping error rates. We present ancIBD, a method to identify IBD segments for human aDNA data implemented as a Python package. Our approach is based on a Hidden Markov Model, using as input genotype probabilities imputed based on a modern reference panel of genomic variation. Through simulation and downsampling experiments, we demonstrate that ancIBD robustly identifies IBD segments longer than 8 centimorgan for aDNA data with at least either 0.25x average whole-genome sequencing (WGS) coverage depth or at least 1x average depth for in-solution enrichment experiments targeting a widely used aDNA SNP set ('1240k'). This application range allows us to screen a substantial fraction of the aDNA record for IBD segments and we showcase two downstream applications. First, leveraging the fact that biological relatives up to the sixth degree are expected to share multiple long IBD segments, we identify relatives between 10,156 ancient Eurasian individuals and document evidence of long-distance migration, for example by identifying a pair of two approximately fifth-degree relatives who were buried 1410km apart in Central Asia 5000 years ago. Second, by applying ancIBD, we reveal new details regarding the spread of ancestry related to Steppe pastoralists into Europe starting 5000 years ago. We find that the first individuals in Central and Northern Europe carrying high amounts of Steppe-ancestry, associated with the Corded Ware culture, share high rates of long IBD (12-25 cM) with Yamnaya herders of the Pontic-Caspian steppe, signaling a strong bottleneck and a recent biological connection on the order of only few hundred years, providing evidence that the Yamnaya themselves are a main source of Steppe ancestry in Corded Ware people. We also detect elevated sharing of long IBD segments between Corded Ware individuals and people associated with the Globular Amphora culture (GAC) from Poland and Ukraine, who were Copper Age farmers not yet carrying Steppe-like ancestry. These IBD links appear for all Corded Ware groups in our analysis, indicating that individuals related to GAC contexts must have had a major demographic impact early on in the genetic admixtures giving rise to various Corded Ware groups across Europe. These results show that detecting IBD segments in aDNA can generate new insights both on a small scale, relevant to understanding the life stories of people, and on the macroscale, relevant to large-scale cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germanÿ
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
40
|
Yu Z, Abdel-Azim S, Duggal P, Vergara C. Identity by descent mapping of HCV spontaneous clearance in populations of diverse ancestry. RESEARCH SQUARE 2023:rs.3.rs-2433454. [PMID: 36712049 PMCID: PMC9882640 DOI: 10.21203/rs.3.rs-2433454/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Background Acute infection with hepatitis C virus (HCV) affects millions of individuals worldwide. Host genetics plays a role in spontaneous clearance of the acute infection which occurs in approximately 30% of the individuals. Common variants in GPR158, genes in the interferon lambda (IFNL) cluster, and the MHC region have been associated with HCV clearance in populations of diverse ancestry. Fine mapping of those regions has identified some key variants and amino acids as potential causal variants but the role of rare variants in those regions and in the genome, in general, has not been explored. We aimed to detect haplotypes containing rare variants related to HCV clearance using identity-by-descent (IBD) haplotype sharing between unrelated cases/case pairs and case/controls pairs in 3,608 individuals with European and African ancestry. Results We detected 1,711,832 and 5,678,043 and individual pairs of IBD segments in the European and African ancestry individuals, respectively. As expected, individuals of African descent had more, and shorter segments compared to Europeans. We did not detect any significant IBD signals in the known associated gene regions. Conclusions IBD is based on sharing of haplotypes and is most powerful in populations with a shared founder or recent common ancestor. For the complex trait of HCV clearance, we used two outbred, global populations that limited our power to detect IBD associations. Overall, in this population-based sample we failed to detect rare variations associated with HCV clearance in individuals of European and African ancestry.
Collapse
Affiliation(s)
- Zixuan Yu
- Johns Hopkins University, Bloomberg School of Public Health
| | | | - Priya Duggal
- Johns Hopkins University, Bloomberg School of Public Health
| | | |
Collapse
|
41
|
Tang K, Naseri A, Wei Y, Zhang S, Zhi D. Open-source benchmarking of IBD segment detection methods for biobank-scale cohorts. Gigascience 2022; 11:giac111. [PMID: 36472573 PMCID: PMC9724555 DOI: 10.1093/gigascience/giac111] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 08/04/2022] [Accepted: 09/28/2022] [Indexed: 12/12/2022] Open
Abstract
In the recent biobank era of genetics, the problem of identical-by-descent (IBD) segment detection received renewed interest, as IBD segments in large cohorts offer unprecedented opportunities in the study of population and genealogical history, as well as genetic association of long haplotypes. While a new generation of efficient methods for IBD segment detection becomes available, direct comparison of these methods is difficult: existing benchmarks were often evaluated in different datasets, with some not openly accessible; methods benchmarked were run under suboptimal parameters; and benchmark performance metrics were not defined consistently. Here, we developed a comprehensive and completely open-source evaluation of the power, accuracy, and resource consumption of these IBD segment detection methods using realistic population genetic simulations with various settings. Our results pave the road for fair evaluation of IBD segment detection methods and provide an practical guide for users.
Collapse
Affiliation(s)
- Kecong Tang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Ardalan Naseri
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yuan Wei
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
42
|
Jang SK, Evans L, Fialkowski A, Arnett DK, Ashley-Koch AE, Barnes KC, Becker DM, Bis JC, Blangero J, Bleecker ER, Boorgula MP, Bowden DW, Brody JA, Cade BE, Jenkins BWC, Carson AP, Chavan S, Cupples LA, Custer B, Damrauer SM, David SP, de Andrade M, Dinardo CL, Fingerlin TE, Fornage M, Freedman BI, Garrett ME, Gharib SA, Glahn DC, Haessler J, Heckbert SR, Hokanson JE, Hou L, Hwang SJ, Hyman MC, Judy R, Justice AE, Kaplan RC, Kardia SLR, Kelly S, Kim W, Kooperberg C, Levy D, Lloyd-Jones DM, Loos RJF, Manichaikul AW, Gladwin MT, Martin LW, Nouraie M, Melander O, Meyers DA, Montgomery CG, North KE, Oelsner EC, Palmer ND, Payton M, Peljto AL, Peyser PA, Preuss M, Psaty BM, Qiao D, Rader DJ, Rafaels N, Redline S, Reed RM, Reiner AP, Rich SS, Rotter JI, Schwartz DA, Shadyab AH, Silverman EK, Smith NL, Smith JG, Smith AV, Smith JA, Tang W, Taylor KD, Telen MJ, Vasan RS, Gordeuk VR, Wang Z, Wiggins KL, Yanek LR, Yang IV, Young KA, Young KL, Zhang Y, Liu DJ, Keller MC, Vrieze S. Rare genetic variants explain missing heritability in smoking. Nat Hum Behav 2022; 6:1577-1586. [PMID: 35927319 PMCID: PMC9985486 DOI: 10.1038/s41562-022-01408-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 06/10/2022] [Indexed: 12/11/2022]
Abstract
Common genetic variants explain less variation in complex phenotypes than inferred from family-based studies, and there is a debate on the source of this 'missing heritability'. We investigated the contribution of rare genetic variants to tobacco use with whole-genome sequences from up to 26,257 unrelated individuals of European ancestries and 11,743 individuals of African ancestries. Across four smoking traits, single-nucleotide-polymorphism-based heritability ([Formula: see text]) was estimated from 0.13 to 0.28 (s.e., 0.10-0.13) in European ancestries, with 35-74% of it attributable to rare variants with minor allele frequencies between 0.01% and 1%. These heritability estimates are 1.5-4 times higher than past estimates based on common variants alone and accounted for 60% to 100% of our pedigree-based estimates of narrow-sense heritability ([Formula: see text], 0.18-0.34). In the African ancestry samples, [Formula: see text] was estimated from 0.03 to 0.33 (s.e., 0.09-0.14) across the four smoking traits. These results suggest that rare variants are important contributors to the heritability of smoking.
Collapse
Affiliation(s)
- Seon-Kyeong Jang
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Luke Evans
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
- Department of Ecology & Evolution, University of Colorado Boulder, Boulder, CO, USA
| | | | - Donna K Arnett
- Dean's Office, University of Kentucky College of Public Health, Lexington, KY, USA
| | | | - Kathleen C Barnes
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Diane M Becker
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | | | - Meher Preethi Boorgula
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E Cade
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Brenda W Campbell Jenkins
- Jackson Heart Study Graduate Training and Education Center, Jackson State University School of Public Health, Jackson, MS, USA
| | - April P Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Sameer Chavan
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Brian Custer
- Vitalant Research Institute, San Francisco, CA, USA
| | - Scott M Damrauer
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Surgery, Corporal Michael Crescenz VA Medical Center, Philadelphia, PA, USA
| | - Sean P David
- Department of Family Medicine, Prtizker School of Medicine, University of Chicago, Chicago, IL, USA
- NorthShore University HealthSystem, Evanston, IL, USA
| | - Mariza de Andrade
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | | | - Tasha E Fingerlin
- Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Center for Genes Environment and Health, National Jewish Health, Denver, CO, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Barry I Freedman
- Section on Nephrology, Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Melanie E Garrett
- Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Sina A Gharib
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Center for Lung Biology, Division of Pulmonary, Critical Care and Sleep Medicine, University of Washington, Seattle, WA, USA
| | - David C Glahn
- Department of Psychiatry, Boston Children's Hosptial and Harvard Medical School, Boston, MA, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Susan R Heckbert
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA, USA
| | - John E Hokanson
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Shih-Jen Hwang
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Matthew C Hyman
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Renae Judy
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anne E Justice
- Department of Population Health Sciences, Geisinger Health System, Danville, PA, USA
| | - Robert C Kaplan
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Shannon Kelly
- Department of Pediatrics, UCSF Benioff Children's Hospital Oakland, Oakland, CA, USA
| | - Wonji Kim
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Daniel Levy
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
- Framingham Heart Study, Framingham, MA, USA
| | | | - Ruth J F Loos
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ani W Manichaikul
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Mark T Gladwin
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Mehdi Nouraie
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Olle Melander
- Department of Clinical Sciences, Lund University, Malmö, Sweden
- Department of Internal Medicine, Skåne University Hospital, Malmö, Sweden
| | | | - Courtney G Montgomery
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Elizabeth C Oelsner
- Division of General Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Marinelle Payton
- Department of Epidemiology and Biostatistics, Jackson Heart Study Graduate Training and Education Center, Jackson State University School of Public Health, Jackson, MS, USA
| | - Anna L Peljto
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Michael Preuss
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, Epidemiology and Health Services, University of Washington, Seattle, WA, USA
| | - Dandi Qiao
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Daniel J Rader
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicholas Rafaels
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Susan Redline
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Robert M Reed
- University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alexander P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - David A Schwartz
- Department of Medicine, School of Medicine, University of Colorado Denver, Aurora, CO, USA
- Department of Immunology, School of Medicine, University of Colorado Denver, Aurora, CO, USA
| | - Aladdin H Shadyab
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, La Jolla, CA, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Nicholas L Smith
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA, USA
| | - J Gustav Smith
- Wallenberg Laboratory/Department of Molecular and Clinical Medicine, Institute of Medicine, Gothenburg University, Gothenburg, Sweden
- Department of Cardiology, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Albert V Smith
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Weihong Tang
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Kent D Taylor
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Marilyn J Telen
- Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Ramachandran S Vasan
- Sections of Preventive Medicine and Epidemiology and Cardiovascular Medicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Victor R Gordeuk
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Zhe Wang
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kerri L Wiggins
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Lisa R Yanek
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ivana V Yang
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Kendra A Young
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Kristin L Young
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yingze Zhang
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Dajiang J Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Matthew C Keller
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
| | - Scott Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
43
|
Meisner J, Albrechtsen A. Haplotype and population structure inference using neural networks in whole-genome sequencing data. Genome Res 2022; 32:1542-1552. [PMID: 35794006 PMCID: PMC9435741 DOI: 10.1101/gr.276813.122] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Accepted: 06/28/2022] [Indexed: 02/03/2023]
Abstract
Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.
Collapse
Affiliation(s)
- Jonas Meisner
- Department of Biology, Bioinformatics Center, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Anders Albrechtsen
- Department of Biology, Bioinformatics Center, University of Copenhagen, DK-2200 Copenhagen, Denmark
| |
Collapse
|
44
|
Wertenbroek R, Rubinacci S, Xenarios I, Thoma Y, Delaneau O. XSI-a genotype compression tool for compressive genomics in large biobanks. Bioinformatics 2022; 38:3778-3784. [PMID: 35748697 PMCID: PMC9344850 DOI: 10.1093/bioinformatics/btac413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/13/2022] [Accepted: 06/22/2022] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. RESULTS We show that xSqueezeIt (XSI) allows for a file size reduction of 4-20× compared with compressed BCF and demonstrate its potential for 'compressive genomics' on the UK Biobank whole-genome sequencing genotypes with 8× faster loading times, 5× faster run of homozygozity computation, 30× faster dot products computation and 280× faster allele counts. AVAILABILITY AND IMPLEMENTATION The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rick Wertenbroek
- School of Management and Engineering Vaud (HEIG-VD), HES-SO University of Applied Sciences and Arts Western Switzerland, Yverdon-les-Bains 1401, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
| | - Simone Rubinacci
- Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
| | - Ioannis Xenarios
- Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
| | - Yann Thoma
- School of Management and Engineering Vaud (HEIG-VD), HES-SO University of Applied Sciences and Arts Western Switzerland, Yverdon-les-Bains 1401, Switzerland
| | - Olivier Delaneau
- Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
| |
Collapse
|
45
|
Population dynamics and genetic connectivity in recent chimpanzee history. CELL GENOMICS 2022; 2:None. [PMID: 35711737 PMCID: PMC9188271 DOI: 10.1016/j.xgen.2022.100133] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 12/29/2021] [Accepted: 04/15/2022] [Indexed: 11/22/2022]
Abstract
Knowledge on the population history of endangered species is critical for conservation, but whole-genome data on chimpanzees (Pan troglodytes) is geographically sparse. Here, we produced the first non-invasive geolocalized catalog of genomic diversity by capturing chromosome 21 from 828 non-invasive samples collected at 48 sampling sites across Africa. The four recognized subspecies show clear genetic differentiation correlating with known barriers, while previously undescribed genetic exchange suggests that these have been permeable on a local scale. We obtained a detailed reconstruction of population stratification and fine-scale patterns of isolation, migration, and connectivity, including a comprehensive picture of admixture with bonobos (Pan paniscus). Unlike humans, chimpanzees did not experience extended episodes of long-distance migrations, which might have limited cultural transmission. Finally, based on local rare variation, we implement a fine-grained geolocalization approach demonstrating improved precision in determining the origin of confiscated chimpanzees.
Collapse
|
46
|
Abstract
Abstract
Living in the same household exposes family members to shared environments and may be reflected in estimates of shared environment in twin analyses. The age at the separation of cotwins in a twin pair marks the end of such shared exposure, and the age of separation is commonly self-reported in studies. The objective of the study was to summarize the age at separation from residential records and use it to validate with self-reported separation status and age at the third and fourth wave of data collection in the FinnTwin12 cohort. Age at separation was generated from the address information, linking it to the Finnish Population information system since birth. Descriptive statistics by sex and zygosity are presented. The mean age at separation from residential records was 20.36 years old. Women separated earlier than men and dizygotic pairs earlier than monozygotic pairs. We also calculated the sensitivity and specificity with the self-reported separation status at waves 3 and 4, and interrater reliability with the self-reported separation age at wave 4. Age at separation from residential records had a relatively poor agreement with the self-report. This work enables us to use a more precise and objective measure for the shared environment in future twin studies.
Collapse
|
47
|
Chiu AM, Molloy EK, Tan Z, Talwalkar A, Sankararaman S. Inferring population structure in biobank-scale genomic data. Am J Hum Genet 2022; 109:727-737. [PMID: 35298920 PMCID: PMC9069078 DOI: 10.1016/j.ajhg.2022.02.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 02/21/2022] [Indexed: 01/07/2023] Open
Abstract
Inferring the structure of human populations from genetic variation data is a key task in population and medical genomic studies. Although a number of methods for population structure inference have been proposed, current methods are impractical to run on biobank-scale genomic datasets containing millions of individuals and genetic variants. We introduce SCOPE, a method for population structure inference that is orders of magnitude faster than existing methods while achieving comparable accuracy. SCOPE infers population structure in about a day on a dataset containing one million individuals and variants as well as on the UK Biobank dataset containing 488,363 individuals and 569,346 variants. Furthermore, SCOPE can leverage allele frequencies from previous studies to improve the interpretability of population structure estimates.
Collapse
Affiliation(s)
- Alec M Chiu
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Erin K Molloy
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA; Institute for Advanced Computer Studies, University of Maryland, College Park, College Park, MD 20742, USA
| | - Zilong Tan
- Facebook, Inc., Menlo Park, CA 94025, USA
| | - Ameet Talwalkar
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Sriram Sankararaman
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
48
|
Balagué-Dobón L, Cáceres A, González JR. Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure. Brief Bioinform 2022; 23:bbac043. [PMID: 35211719 PMCID: PMC8921734 DOI: 10.1093/bib/bbac043] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 12/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
Collapse
|
49
|
Gardner EJ, Neville MDC, Samocha KE, Barclay K, Kolk M, Niemi MEK, Kirov G, Martin HC, Hurles ME. Reduced reproductive success is associated with selective constraint on human genes. Nature 2022; 603:858-863. [PMID: 35322230 DOI: 10.1038/s41586-022-04549-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 02/07/2022] [Indexed: 12/22/2022]
Abstract
Genome-wide sequencing of human populations has revealed substantial variation among genes in the intensity of purifying selection acting on damaging genetic variants1. Although genes under the strongest selective constraint are highly enriched for associations with Mendelian disorders, most of these genes are not associated with disease and therefore the nature of the selection acting on them is not known2. Here we show that genetic variants that damage these genes are associated with markedly reduced reproductive success, primarily owing to increased childlessness, with a stronger effect in males than in females. We present evidence that increased childlessness is probably mediated by genetically associated cognitive and behavioural traits, which may mean that male carriers are less likely to find reproductive partners. This reduction in reproductive success may account for 20% of purifying selection against heterozygous variants that ablate protein-coding genes. Although this genetic association may only account for a very minor fraction of the overall likelihood of being childless (less than 1%), especially when compared to more influential sociodemographic factors, it may influence how genes evolve over time.
Collapse
Affiliation(s)
- Eugene J Gardner
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, UK.,Medical Research Council (MRC) Epidemiology Unit, University of Cambridge School of Clinical Medicine, Institute of Metabolic Science, Cambridge Biomedical Campus, Cambridge, UK
| | | | - Kaitlin E Samocha
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, UK
| | - Kieron Barclay
- Max Planck Institute for Demographic Research, Rostock, Germany.,Demography Unit, Department of Sociology, Stockholm University, Stockholm, Sweden.,Swedish Collegium for Advanced Study, Uppsala, Sweden
| | - Martin Kolk
- Demography Unit, Department of Sociology, Stockholm University, Stockholm, Sweden
| | - Mari E K Niemi
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, UK
| | - George Kirov
- Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Hilary C Martin
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, UK
| | - Matthew E Hurles
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, UK.
| |
Collapse
|
50
|
Mathieson I, Terhorst J. Direct detection of natural selection in Bronze Age Britain. Genome Res 2022; 32:2057-2067. [PMID: 36316157 PMCID: PMC9808619 DOI: 10.1101/gr.276862.122] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 08/29/2022] [Indexed: 11/04/2022]
Abstract
We developed a novel method for efficiently estimating time-varying selection coefficients from genome-wide ancient DNA data. In simulations, our method accurately recovers selective trajectories and is robust to misspecification of population size. We applied it to a large data set of ancient and present-day human genomes from Britain and identified seven loci with genome-wide significant evidence of selection in the past 4500 yr. Almost all of them can be related to increased vitamin D or calcium levels, suggesting strong selective pressure on these or related phenotypes. However, the strength of selection on individual loci varied substantially over time, suggesting that cultural or environmental factors moderated the genetic response. Of 28 complex anthropometric and metabolic traits, skin pigmentation was the only one with significant evidence of polygenic selection, further underscoring the importance of phenotypes related to vitamin D. Our approach illustrates the power of ancient DNA to characterize selection in human populations and illuminates the recent evolutionary history of Britain.
Collapse
Affiliation(s)
- Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|