1
|
Schweiger R, Lee S, Zhou C, Yang TP, Smith K, Li S, Sanghvi R, Neville M, Mitchell E, Nessa A, Wadge S, Small KS, Campbell PJ, Sudmant PH, Rahbari R, Durbin R. Insights into non-crossover recombination from long-read sperm sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.05.602249. [PMID: 39005338 PMCID: PMC11245106 DOI: 10.1101/2024.07.05.602249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Meiotic recombination is a fundamental process that generates genetic diversity by creating new combinations of existing alleles. Although human crossovers have been studied at the pedigree, population and single-cell level, the more frequent non-crossover events that lead to gene conversion are harder to study, particularly at the individual level. Here we show that single high-fidelity long sequencing reads from sperm can capture both crossovers and non-crossovers, allowing effectively arbitrary sample sizes for analysis from one male. Using fifteen sperm samples from thirteen donors we demonstrate variation between and within donors for the rates of different types of recombination. Intriguingly, we observe a tendency for non-crossover gene conversions to occur upstream of nearby PRDM9 binding sites, whereas crossover locations have a slight downstream bias. We further provide evidence for two distinct non-crossover processes. One gives rise to the vast majority of non-crossovers with mean conversion tract length under 50bp, which we suggest is an outcome of standard PRDM9-induced meiotic recombination. In contrast ~2% of non-crossovers have much longer mean tract length, and potentially originate from the same process as complex events with more than two haplotype switches, which is not associated with PRDM9 binding sites and is also seen in somatic cells.
Collapse
Affiliation(s)
- Regev Schweiger
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom
| | - Sangjin Lee
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Chenxi Zhou
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom
| | - Tsun-Po Yang
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Katie Smith
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Stacy Li
- Department of Integrative Biology, University of California Berkeley, Berkeley, USA
| | - Rashesh Sanghvi
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Matthew Neville
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Emily Mitchell
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Ayrun Nessa
- Kings College London, Department of Twin Research & Genetic Epidemiology, London, United Kingdom
| | - Sam Wadge
- Kings College London, Department of Twin Research & Genetic Epidemiology, London, United Kingdom
| | - Kerrin S Small
- Kings College London, Department of Twin Research & Genetic Epidemiology, London, United Kingdom
| | - Peter J Campbell
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
- Wellcome-MRC Cambridge Stem Cell Institute, Cambridge Biomedical Campus, Cambridge, UK
| | - Peter H Sudmant
- Department of Integrative Biology, University of California Berkeley, Berkeley, USA
- Center for Computational Biology, University of California Berkeley, Berkeley, USA
| | - Raheleh Rahbari
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| |
Collapse
|
2
|
Temple SD, Thompson EA. Identity-by-descent segments in large samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.05.597656. [PMID: 38895476 PMCID: PMC11185678 DOI: 10.1101/2024.06.05.597656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
If two haplotypes share the same alleles for an extended gene tract, these haplotypes are likely to derive identical-by-descent from a recent common ancestor. Identity-by-descent segment lengths are correlated via unobserved tree and recombination processes, which commonly presents challenges to the derivation of theoretical results in population genetics. Under interpretable regularity conditions, we show that the proportion of detectable identity-by-descent segments at a locus is normally distributed for large sample size and large scaled population size. We use efficient and exact simulations to study the distributional behavior of the detectable identity-by-descent rate in finite samples. One consequence of non-normality in finite samples is that genome-wide scans based on identity-by-descent rates may be subject to anti-conservative Type 1 error control. Highlights We show the asymptotic normality of the identity-by-descent rate, a mean of correlated binary random variables that arises in population genetics studies.We describe an efficient algorithm capable of simulating long identity-by-descent segments around a locus in large sample sizes.In enormous simulation studies, we use this algorithm to characterize the distributional properties of the identity-by-descent rate.In finite samples, we reject the null hypothesis of normality more often than the nominal significance level, indicating that genome-wide scans based on identity-by-descent rates may be anti-conservative.
Collapse
|
3
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. Am J Hum Genet 2024; 111:691-700. [PMID: 38513668 PMCID: PMC11023918 DOI: 10.1016/j.ajhg.2024.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024] Open
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
4
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.565574. [PMID: 37961601 PMCID: PMC10635131 DOI: 10.1101/2023.11.03.565574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more efficient collection and storage of identity by descent (IBD) information than approaches that detect and store pairwise IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach.
Collapse
Affiliation(s)
| | - Brian L. Browning
- Department of Biostatistics, University of Washington, Seattle, WA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA
| |
Collapse
|
5
|
Nait Saada J, Tsangalidou Z, Stricker M, Palamara PF. Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks. Mol Biol Evol 2023; 40:msad211. [PMID: 37738175 PMCID: PMC10581698 DOI: 10.1093/molbev/msad211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/11/2023] [Accepted: 09/18/2023] [Indexed: 09/24/2023] Open
Abstract
Accurate inference of the time to the most recent common ancestor (TMRCA) between pairs of individuals and of the age of genomic variants is key in several population genetic analyses. We developed a likelihood-free approach, called CoalNN, which uses a convolutional neural network to predict pairwise TMRCAs and allele ages from sequencing or SNP array data. CoalNN is trained through simulation and can be adapted to varying parameters, such as demographic history, using transfer learning. Across several simulated scenarios, CoalNN matched or outperformed the accuracy of model-based approaches for pairwise TMRCA and allele age prediction. We applied CoalNN to settings for which model-based approaches are under-developed and performed analyses to gain insights into the set of features it uses to perform TMRCA prediction. We next used CoalNN to analyze 2,504 samples from 26 populations in the 1,000 Genome Project data set, inferring the age of ∼80 million variants. We observed substantial variation across populations and for variants predicted to be pathogenic, reflecting heterogeneous demographic histories and the action of negative selection. We used CoalNN's predicted allele ages to construct genome-wide annotations capturing the signature of past negative selection. We performed LD-score regression analysis of heritability using summary association statistics from 63 independent complex traits and diseases (average N=314k), observing increased annotation-specific effects on heritability compared to a previous allele age annotation. These results highlight the effectiveness of using likelihood-free, simulation-trained models to infer properties of gene genealogies in large genomic data sets.
Collapse
Affiliation(s)
| | | | | | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
6
|
Lee YL, Bouwman AC, Harland C, Bosse M, Costa Monteiro Moreira G, Veerkamp RF, Mullaart E, Cambisano N, Groenen MAM, Karim L, Coppieters W, Georges M, Charlier C. The rate of de novo structural variation is increased in in vitro-produced offspring and preferentially affects the paternal genome. Genome Res 2023; 33:1455-1464. [PMID: 37793781 PMCID: PMC10620045 DOI: 10.1101/gr.277884.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 08/08/2023] [Indexed: 10/06/2023]
Abstract
Assisted reproductive technologies (ARTs), including in vitro maturation and fertilization (IVF), are increasingly used in human and animal reproduction. Whether these technologies directly affect the rate of de novo mutation (DNM), and to what extent, has been a matter of debate. Here we take advantage of domestic cattle, characterized by complex pedigrees that are ideally suited to detect DNMs and by the systematic use of ART, to study the rate of de novo structural variation (dnSV) in this species and how it is impacted by IVF. By exploiting features of associated de novo point mutations (dnPMs) and dnSVs in clustered DNMs, we provide strong evidence that (1) IVF increases the rate of dnSV approximately fivefold, and (2) the corresponding mutations occur during the very early stages of embryonic development (one- and two-cell stage), yet primarily affect the paternal genome.
Collapse
Affiliation(s)
- Young-Lim Lee
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Aniek C Bouwman
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Chad Harland
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium
- Livestock Improvement Corporation, Hamilton 3240, New Zealand
| | - Mirte Bosse
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | | | - Roel F Veerkamp
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | | | - Nadine Cambisano
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Martien A M Groenen
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Latifa Karim
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Wouter Coppieters
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Michel Georges
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
| | - Carole Charlier
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
| |
Collapse
|
7
|
Anderson-Trocmé L, Nelson D, Zabad S, Diaz-Papkovich A, Kryukov I, Baya N, Touvier M, Jeffery B, Dina C, Vézina H, Kelleher J, Gravel S. On the genes, genealogies, and geographies of Quebec. Science 2023; 380:849-855. [PMID: 37228217 DOI: 10.1126/science.add5300] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 04/24/2023] [Indexed: 05/27/2023]
Abstract
Population genetic models only provide coarse representations of real-world ancestry. We used a pedigree compiled from 4 million parish records and genotype data from 2276 French and 20,451 French Canadian individuals to finely model and trace French Canadian ancestry through space and time. The loss of ancestral French population structure and the appearance of spatial and regional structure highlights a wide range of population expansion models. Geographic features shaped migrations, and we find enrichments for migration, genetic, and genealogical relatedness patterns within river networks across regions of Quebec. Finally, we provide a freely accessible simulated whole-genome sequence dataset with spatiotemporal metadata for 1,426,749 individuals reflecting intricate French Canadian population structure. Such realistic population-scale simulations provide opportunities to investigate population genetics at an unprecedented resolution.
Collapse
Affiliation(s)
- Luke Anderson-Trocmé
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| | - Dominic Nelson
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| | - Shadi Zabad
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - Alex Diaz-Papkovich
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Ivan Kryukov
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| | - Nikolas Baya
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Mathilde Touvier
- Sorbonne Paris Nord University, INSERM U1153, INRAE U1125, CNAM, Nutritional Epidemiology Research Team (EREN), Epidemiology and Statistics Research Center, University Paris Cité (CRESS), Bobigny, France
| | - Ben Jeffery
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Christian Dina
- Nantes Université, CNRS, INSERM, l'institut du thorax, Nantes, France
| | - Hélène Vézina
- BALSAC Project, Université du Québec á Chicoutimi, Chicoutimi, QC, Canada
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| |
Collapse
|
8
|
Wang X, Peischl S, Heckel G. Demographic history and genomic consequences of 10,000 generations of isolation in a wild mammal. Curr Biol 2023; 33:2051-2062.e4. [PMID: 37178689 DOI: 10.1016/j.cub.2023.04.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 12/20/2022] [Accepted: 04/17/2023] [Indexed: 05/15/2023]
Abstract
Increased human activities caused the isolation of populations in many species-often associated with genetic depletion and negative fitness effects. The effects of isolation are predicted by theory, but long-term data from natural populations are scarce. We show, with full genome sequences, that common voles (Microtus arvalis) in the Orkney archipelago have remained genetically isolated from conspecifics in continental Europe since their introduction by humans over 5,000 years ago. Modern Orkney vole populations are genetically highly differentiated from continental conspecifics as a result of genetic drift processes. Colonization likely started on the biggest Orkney island and vole populations on smaller islands were gradually split off, without signs of secondary admixture. Despite having large modern population sizes, Orkney voles are genetically depauperate and successive introductions to smaller islands resulted in further reduction of genetic diversity. We detected high levels of fixation of predicted deleterious variation compared with continental populations, particularly on smaller islands, yet the fitness effects realized in nature are unknown. Simulations showed that predominantly mildly deleterious mutations were fixed in populations, while highly deleterious mutations were purged early in the history of the Orkney population. Relaxation of selection overall due to benign environmental conditions on the islands and the effects of soft selection may have contributed to the repeated, successful establishment of Orkney voles despite potential fitness loss. Furthermore, the specific life history of these small mammals, resulting in relatively large population sizes, has probably been important for their long-term persistence in full isolation.
Collapse
Affiliation(s)
- Xuejing Wang
- Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Stephan Peischl
- Interfaculty Bioinformatics Unit, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland; Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| | - Gerald Heckel
- Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland; Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland.
| |
Collapse
|
9
|
Johri P, Pfeifer SP, Jensen JD. Developing an evolutionary baseline model for humans: jointly inferring purifying selection with population history. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.11.536488. [PMID: 37090533 PMCID: PMC10120674 DOI: 10.1101/2023.04.11.536488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Building evolutionarily appropriate baseline models for natural populations is not only important for answering fundamental questions in population genetics - including quantifying the relative contributions of adaptive vs. non-adaptive processes - but it is also essential for identifying candidate loci experiencing relatively rare and episodic forms of selection ( e.g., positive or balancing selection). Here, a baseline model was developed for a human population of West African ancestry, the Yoruba, comprising processes constantly operating on the genome ( i.e. , purifying and background selection, population size changes, recombination rate heterogeneity, and gene conversion). Specifically, to perform joint inference of selective effects with demography, an approximate Bayesian approach was employed that utilizes the decay of background selection effects around functional elements, taking into account genomic architecture. This approach inferred a recent 6-fold population growth together with a distribution of fitness effects that is skewed towards effectively neutral mutations. Importantly, these results further suggest that, while strong and/or frequent recurrent positive selection is inconsistent with observed data, weak to moderate positive selection is consistent but unidentifiable if rare.
Collapse
|
10
|
Estimating the genome-wide mutation rate from thousands of unrelated individuals. Am J Hum Genet 2022; 109:2178-2184. [PMID: 36370709 PMCID: PMC9748258 DOI: 10.1016/j.ajhg.2022.10.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 10/15/2022] [Indexed: 11/13/2022] Open
Abstract
We provide a method for estimating the genome-wide mutation rate from sequence data on unrelated individuals by using segments of identity by descent (IBD). The length of an IBD segment indicates the time to shared ancestor of the segment, and mutations that have occurred since the shared ancestor result in discordances between the two IBD haplotypes. Previous methods for IBD-based estimation of mutation rate have required the use of family data for accurate phasing of the genotypes. This has limited the scope of application of IBD-based mutation rate estimation. Here, we develop an IBD-based method for mutation rate estimation from population data, and we apply it to whole-genome sequence data on 4,166 European American individuals from the TOPMed Framingham Heart Study, 2,996 European American individuals from the TOPMed My Life, Our Future study, and 1,586 African American individuals from the TOPMed Hypertension Genetic Epidemiology Network study. Although mutation rates may differ between populations as a result of genetic factors, demographic factors such as average parental age, and environmental exposures, our results are consistent with equal genome-wide average mutation rates across these three populations. Our overall estimate of the average genome-wide mutation rate per 108 base pairs per generation for single-nucleotide variants is 1.24 (95% CI 1.18-1.33).
Collapse
|
11
|
Johnson KE, Adams CJ, Voight BF. Identifying rare variants inconsistent with identity-by-descent in population-scale whole-genome sequencing data. Methods Ecol Evol 2022; 13:2429-2442. [PMID: 38938451 PMCID: PMC11210625 DOI: 10.1111/2041-210x.13991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 09/12/2022] [Indexed: 12/01/2022]
Abstract
Analyses of genetic variation typically assume that rare variants within a population are inherited from a single common ancestral event identity-by-descent (IBD). However, there are genetic and technical processes through which rare variants in population genetic data may deviate from this simple evolutionary model, including recurrent mutations, gene conversions and genotyping error. All these processes can decrease the expected length of shared background haplotype surrounding a rare variant if that variant was inherited from a single event descending from a common ancestor. No method exists to computationally infer rare variants inconsistent with this simple model-denoted here as 'IBD-inconsistent'-using unphased population sequencing data.We hypothesized that the difference in shared haplotype background length can distinguish variants consistent and inconsistent with this simple IBD transmission population sequencing data without pedigree information. We implemented a Bayesian hierarchical model and used Gibbs sampling to estimate the posterior probability of IBD state for rare variants, using simulated recurrent mutations to demonstrate that our approach accurately distinguishes rare variants consistent and inconsistent with a simple IBD inheritance model.Applying our method to whole-genome sequencing data from 3,621 human individuals in the UK10K consortium, we found that IBD-inconsistent variants correlated with higher local mutation rates and genomic features like replication timing. Using a heuristic to categorize IBD-inconsistent variants as gene conversions, we found that potential gene conversions had expected properties such as enriched local GC content.By identifying IBD-inconsistent variants, we can better understand the spectrum of recent mutations in human populations, a source of genetic variation driving evolution and a key factor in understanding recent demographic history.
Collapse
Affiliation(s)
- Kelsey E. Johnson
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Christopher J. Adams
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Benjamin F. Voight
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
12
|
Hayeck TJ, Stong N, Baugh E, Dhindsa R, Turner TN, Malakar A, Mosbruger TL, Shaw GTW, Duan Y, Ionita-Laza I, Goldstein D, Allen AS. Ancestry adjustment improves genome-wide estimates of regional intolerance. Genetics 2022; 221:iyac050. [PMID: 35385101 PMCID: PMC9157129 DOI: 10.1093/genetics/iyac050] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 02/24/2022] [Indexed: 11/12/2022] Open
Abstract
Genomic regions subject to purifying selection are more likely to carry disease-causing mutations than regions not under selection. Cross species conservation is often used to identify such regions but with limited resolution to detect selection on short evolutionary timescales such as that occurring in only one species. In contrast, genetic intolerance looks for depletion of variation relative to expectation within a species, allowing species-specific features to be identified. When estimating the intolerance of noncoding sequence, methods strongly leverage variant frequency distributions. As the expected distributions depend on ancestry, if not properly controlled for, ancestral population source may obfuscate signals of selection. We demonstrate that properly incorporating ancestry in intolerance estimation greatly improved variant classification. We provide a genome-wide intolerance map that is conditional on ancestry and likely to be particularly valuable for variant prioritization.
Collapse
Affiliation(s)
- Tristan J Hayeck
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Nicholas Stong
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Evan Baugh
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Ryan Dhindsa
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Tychele N Turner
- Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Ayan Malakar
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Timothy L Mosbruger
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Grace Tzun-Wen Shaw
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yuncheng Duan
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA
| | | | - David Goldstein
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA
| |
Collapse
|
13
|
Sticca EL, Belbin GM, Gignoux CR. Current Developments in Detection of Identity-by-Descent Methods and Applications. Front Genet 2021; 12:722602. [PMID: 34567074 PMCID: PMC8461052 DOI: 10.3389/fgene.2021.722602] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 08/24/2021] [Indexed: 01/23/2023] Open
Abstract
Identity-by-descent (IBD), the detection of shared segments inherited from a common ancestor, is a fundamental concept in genomics with broad applications in the characterization and analysis of genomes. While historically the concept of IBD was extensively utilized through linkage analyses and in studies of founder populations, applications of IBD-based methods subsided during the genome-wide association study era. This was primarily due to the computational expense of IBD detection, which becomes increasingly relevant as the field moves toward the analysis of biobank-scale datasets that encompass individuals from highly diverse backgrounds. To address these computational barriers, the past several years have seen new methodological advances enabling IBD detection for datasets in the hundreds of thousands to millions of individuals, enabling novel analyses at an unprecedented scale. Here, we describe the latest innovations in IBD detection and describe opportunities for the application of IBD-based methods across a broad range of questions in the field of genomics.
Collapse
Affiliation(s)
- Evan L Sticca
- Human Medical Genetics and Genomics Program and Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Gillian M Belbin
- Institute for Genomic Health, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Christopher R Gignoux
- Human Medical Genetics and Genomics Program and Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
14
|
Abstract
Hypertrophic cardiomyopathy (HCM) is a genetic disease of the myocardium characterized by a hypertrophic left ventricle with a preserved or increased ejection fraction. Cardiac hypertrophy is often asymmetrical, which is associated with left ventricular outflow tract obstruction. Myocyte hypertrophy, disarray, and myocardial fibrosis constitute the histological features of HCM. HCM is a relatively benign disease but an important cause of sudden cardiac death in the young and heart failure in the elderly. Pathogenic variants (PVs) in genes encoding protein constituents of the sarcomeres are the main causes of HCM. PVs exhibit a gradient of effect sizes, as reflected in their penetrance and variable phenotypic expression of HCM. MYH7 and MYBPC3, encoding β-myosin heavy chain and myosin binding protein C, respectively, are the two most common causal genes and responsible for ≈40% of all HCM cases but a higher percentage of HCM in large families. PVs in genes encoding protein components of the thin filaments are responsible for ≈5% of the HCM cases. Whereas pathogenicity of the genetic variants in large families has been firmly established, ascertainment causality of the PVs in small families and sporadic cases is challenging. In the latter category, PVs are best considered as probabilistic determinants of HCM. Deciphering the genetic basis of HCM has enabled routine genetic testing and has partially elucidated the underpinning mechanism of HCM as increased number of the myosin molecules that are strongly bound to actin. The discoveries have led to the development of mavacamten that targets binding of the myosin molecule to actin filaments and imparts beneficial clinical effects. In the coming years, the yield of the genetic testing is expected to be improved and the so-called missing causal gene be identified. The advances are also expected to enable development of additional specific therapies and editing of the mutations in HCM.
Collapse
Affiliation(s)
- A J Marian
- Center for Cardiovascular Genetics, Institute of Molecular Medicine and Department of Medicine, University of Texas Health Sciences Center at Houston
| |
Collapse
|
15
|
Widespread signatures of natural selection across human complex traits and functional genomic categories. Nat Commun 2021; 12:1164. [PMID: 33608517 PMCID: PMC7896067 DOI: 10.1038/s41467-021-21446-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 01/27/2021] [Indexed: 01/16/2023] Open
Abstract
Understanding how natural selection has shaped genetic architecture of complex traits is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level GWAS data to estimate multiple genetic architecture parameters including selection signature. Here, we present a method (SBayesS) that only requires GWAS summary statistics. We analyse data for 155 complex traits (n = 27k–547k) and project the estimates onto those obtained from evolutionary simulations. We estimate that, on average across traits, about 1% of human genome sequence are mutational targets with a mean selection coefficient of ~0.001. Common diseases, on average, show a smaller number of mutational targets and have been under stronger selection, compared to other traits. SBayesS analyses incorporating functional annotations reveal that selection signatures vary across genomic regions, among which coding regions have the strongest selection signature and are enriched for both the number of associated variants and the magnitude of effect sizes. Methods to study how natural selection shapes genetic architecture of complex traits rely on individual level genome-wide association study (GWAS) data. Here, the authors present a Bayesian method using GWAS summary statistics to study genetic architecture and apply this to 155 complex traits.
Collapse
|
16
|
Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM, Pitsillides AN, LeFaive J, Lee SB, Tian X, Browning BL, Das S, Emde AK, Clarke WE, Loesch DP, Shetty AC, Blackwell TW, Smith AV, Wong Q, Liu X, Conomos MP, Bobo DM, Aguet F, Albert C, Alonso A, Ardlie KG, Arking DE, Aslibekyan S, Auer PL, Barnard J, Barr RG, Barwick L, Becker LC, Beer RL, Benjamin EJ, Bielak LF, Blangero J, Boehnke M, Bowden DW, Brody JA, Burchard EG, Cade BE, Casella JF, Chalazan B, Chasman DI, Chen YDI, Cho MH, Choi SH, Chung MK, Clish CB, Correa A, Curran JE, Custer B, Darbar D, Daya M, de Andrade M, DeMeo DL, Dutcher SK, Ellinor PT, Emery LS, Eng C, Fatkin D, Fingerlin T, Forer L, Fornage M, Franceschini N, Fuchsberger C, Fullerton SM, Germer S, Gladwin MT, Gottlieb DJ, Guo X, Hall ME, He J, Heard-Costa NL, Heckbert SR, Irvin MR, Johnsen JM, Johnson AD, Kaplan R, Kardia SLR, Kelly T, Kelly S, Kenny EE, Kiel DP, Klemmer R, Konkle BA, Kooperberg C, Köttgen A, Lange LA, Lasky-Su J, Levy D, Lin X, Lin KH, Liu C, Loos RJF, Garman L, Gerszten R, Lubitz SA, Lunetta KL, Mak ACY, Manichaikul A, Manning AK, Mathias RA, McManus DD, McGarvey ST, Meigs JB, Meyers DA, Mikulla JL, Minear MA, Mitchell BD, Mohanty S, Montasser ME, Montgomery C, Morrison AC, Murabito JM, Natale A, Natarajan P, Nelson SC, North KE, O'Connell JR, Palmer ND, Pankratz N, Peloso GM, Peyser PA, Pleiness J, Post WS, Psaty BM, Rao DC, Redline S, Reiner AP, Roden D, Rotter JI, Ruczinski I, Sarnowski C, Schoenherr S, Schwartz DA, Seo JS, Seshadri S, Sheehan VA, Sheu WH, Shoemaker MB, Smith NL, Smith JA, Sotoodehnia N, Stilp AM, Tang W, Taylor KD, Telen M, Thornton TA, Tracy RP, Van Den Berg DJ, Vasan RS, Viaud-Martinez KA, Vrieze S, Weeks DE, Weir BS, Weiss ST, Weng LC, Willer CJ, Zhang Y, Zhao X, Arnett DK, Ashley-Koch AE, Barnes KC, Boerwinkle E, Gabriel S, Gibbs R, Rice KM, Rich SS, Silverman EK, Qasba P, Gan W, Papanicolaou GJ, Nickerson DA, Browning SR, Zody MC, Zöllner S, Wilson JG, Cupples LA, Laurie CC, Jaquish CE, Hernandez RD, O'Connor TD, Abecasis GR. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 2021; 590:290-299. [PMID: 33568819 PMCID: PMC7875770 DOI: 10.1038/s41586-021-03205-y] [Citation(s) in RCA: 875] [Impact Index Per Article: 291.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 01/07/2021] [Indexed: 02/08/2023]
Abstract
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
Collapse
Affiliation(s)
- Daniel Taliun
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Daniel N Harris
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Michael D Kessler
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Jedidiah Carlson
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Zachary A Szpiech
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA, USA
| | - Raul Torres
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Sarah A Gagliano Taliun
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | | | | | - Hyun Min Kang
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | | | - Jonathon LeFaive
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Seung-Been Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Xiaowen Tian
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Brian L Browning
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Sayantan Das
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | | | | | - Douglas P Loesch
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Amol C Shetty
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Thomas W Blackwell
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Albert V Smith
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Quenna Wong
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Xiaoming Liu
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Dean M Bobo
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - François Aguet
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Alvaro Alonso
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | | | - Dan E Arking
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | - Paul L Auer
- Zilber School of Public Health, University of Wisconsin Milwaukee, Milwaukee, WI, USA
| | | | - R Graham Barr
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
- Department of Epidemiology, Columbia University Medical Center, New York, NY, USA
| | | | | | - Rebecca L Beer
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Emelia J Benjamin
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Lawrence F Bielak
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - John Blangero
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Michael Boehnke
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A Brody
- Department of Medicine, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Esteban G Burchard
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Brian E Cade
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - James F Casella
- Department of Pediatrics, Johns Hopkins University, Baltimore, MD, USA
- Division of Pediatric Hematology, Johns Hopkins University, Baltimore, MD, USA
| | - Brandon Chalazan
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Daniel I Chasman
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Mina K Chung
- Department of Cardiovascular Medicine, Heart & Vascular Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Clary B Clish
- Metabolomics Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Adolfo Correa
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
- Department of Pediatrics, University of Mississippi Medical Center, Jackson, MS, USA
- Department of Population Health Science, University of Mississippi Medical Center, Jackson, MS, USA
| | - Joanne E Curran
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Brian Custer
- Vitalant Research Institute, San Francisco, CA, USA
- Department of Laboratory Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Dawood Darbar
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Michelle Daya
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Dawn L DeMeo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Susan K Dutcher
- McDonnell Genome Institute, Washington University, St Louis, MO, USA
- Department of Genetics, Washington University, St Louis, MO, USA
| | - Patrick T Ellinor
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Leslie S Emery
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Celeste Eng
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Diane Fatkin
- Molecular Cardiology Division, Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Kensington, New South Wales, Australia
- Cardiology Department, St Vincent's Hospital, Darlinghurst, New South Wales, Australia
| | - Tasha Fingerlin
- National Jewish Health, Center for Genes, Environment and Health, Denver, CO, USA
| | - Lukas Forer
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Myriam Fornage
- Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Christian Fuchsberger
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
- Institute for Biomedicine, Eurac Research, Bolzano, Italy
| | - Stephanie M Fullerton
- Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Mark T Gladwin
- Pittsburgh Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Daniel J Gottlieb
- VA Boston Healthcare System, Boston, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Michael E Hall
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Jiang He
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
- Tulane University Translational Science Institute, Tulane University, New Orleans, LA, USA
| | - Nancy L Heard-Costa
- Framingham Heart Study, Framingham, MA, USA
- Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Susan R Heckbert
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jill M Johnsen
- Department of Medicine, University of Washington, Seattle, WA, USA
- Bloodworks Northwest Research Institute, Seattle, WA, USA
| | - Andrew D Johnson
- Framingham Heart Study, Framingham, MA, USA
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Framingham, MA, USA
| | - Robert Kaplan
- Albert Einstein College of Medicine, New York, NY, USA
| | - Sharon L R Kardia
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Tanika Kelly
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
| | - Shannon Kelly
- Department of Epidemiology, Vitalant Research Institute, San Francisco, CA, USA
- Department of Pediatrics, UCSF Benioff Children's Hospital, Oakland, CA, USA
- Division of Pediatric Hematology, UCSF Benioff Children's Hospital, Oakland, CA, USA
| | - Eimear E Kenny
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Douglas P Kiel
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Hinda and Arthur Marcus Institute for Aging Research, Hebrew SeniorLife, Boston, MA, USA
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Robert Klemmer
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Barbara A Konkle
- Department of Medicine, University of Washington, Seattle, WA, USA
- Bloodworks Northwest Research Institute, Seattle, WA, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Anna Köttgen
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD, USA
- Institute of Genetic Epidemiology, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Leslie A Lange
- Department of Medicine, University of Colorado at Denver, Aurora, CO, USA
| | - Jessica Lasky-Su
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Brigham and Women's Hospital, Boston, MA, USA
| | - Daniel Levy
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Framingham, MA, USA
| | - Xihong Lin
- Biostatistics and Statistics, Harvard University, Boston, MA, USA
| | - Keng-Han Lin
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lori Garman
- Department of Genes and Human Disease, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | | | | | - Kathryn L Lunetta
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Angel C Y Mak
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Alisa K Manning
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA, USA
- Metabolism Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Rasika A Mathias
- Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - David D McManus
- Cardiovascular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Stephen T McGarvey
- International Health Institute, Brown University, Providence, RI, USA
- Department of Epidemiology, Brown University, Providence, RI, USA
- Department of Anthropology, Brown University, Providence, RI, USA
| | - James B Meigs
- Division of General Internal Medicine, Massachusetts General Hospital, Harvard Medical School, The Broad Institute of MIT and Harvard, Boston, MA, USA
| | | | - Julie L Mikulla
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mollie A Minear
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Braxton D Mitchell
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Baltimore Veterans Administration Medical Center, Baltimore, MD, USA
| | - Sanghamitra Mohanty
- Texas Cardiac Arrhythmia Institute, St David's Medical Center, Austin, TX, USA
- Department of Internal Medicine, Dell Medical School, Austin, TX, USA
| | - May E Montasser
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Courtney Montgomery
- Department of Genes and Human Disease, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Joanne M Murabito
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Andrea Natale
- Texas Cardiac Arrhythmia Institute, St David's Medical Center, Austin, TX, USA
| | - Pradeep Natarajan
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Sarah C Nelson
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Jeffrey R O'Connell
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Patricia A Peyser
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Jacob Pleiness
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Wendy S Post
- Division of Cardiology, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Bruce M Psaty
- Department of Medicine, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Services, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - D C Rao
- Division of Biostatistics, Washington University in St Louis, St Louis, MO, USA
| | - Susan Redline
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Dan Roden
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Chloé Sarnowski
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Sebastian Schoenherr
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | | | - Jeong-Sun Seo
- Precision Medicine Center, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Macrogen Inc, Seoul, Republic of Korea
- Gong Wu Genomic Medicine Institute, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sudha Seshadri
- Framingham Heart Study, Framingham, MA, USA
- Glenn Biggs Institute for Alzheimer's and Neurodegenerative Diseases, University of Texas Health Sciences Center at San Antonio, San Antonio, TX, USA
| | - Vivien A Sheehan
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, USA
- Aflac Cancer and Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, GA, USA
| | - Wayne H Sheu
- Taichung Veterans General Hospital Taiwan, Taichung City, Taiwan
| | | | - Nicholas L Smith
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
- Seattle Epidemiologic Research and Information Center, Department of Veterans Affairs Office of Research and Development, Seattle, WA, USA
| | - Jennifer A Smith
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Nona Sotoodehnia
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Adrienne M Stilp
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Weihong Tang
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | | | | | - Russell P Tracy
- Department of Pathology & Laboratory Medicine, University of Vermont Larner College of Medicine, Burlington, VT, USA
| | - David J Van Den Berg
- Center for Genetic Epidemiology, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Ramachandran S Vasan
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | | | - Scott Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Daniel E Weeks
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bruce S Weir
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Scott T Weiss
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Brigham and Women's Hospital, Boston, MA, USA
| | | | - Cristen J Willer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Internal Medicine-Cardiology, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Yingze Zhang
- Pittsburgh Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xutong Zhao
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Donna K Arnett
- Department of Epidemiology, University of Kentucky, Lexington, KY, USA
| | - Allison E Ashley-Koch
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC, USA
| | - Kathleen C Barnes
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Eric Boerwinkle
- University of Texas Health Science Center at Houston, Houston, TX, USA
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Stacey Gabriel
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Richard Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Pankaj Qasba
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Weiniu Gan
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - George J Papanicolaou
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Deborah A Nickerson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Northwest Genomics Center, Seattle, WA, USA
- Brotman Baty Institute, Seattle, WA, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | | | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
- Framingham Heart Study, Framingham, MA, USA.
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Cashell E Jaquish
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada.
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
| | - Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA.
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA.
| | - Gonçalo R Abecasis
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
| |
Collapse
|
17
|
Nait Saada J, Kalantzis G, Shyr D, Cooper F, Robinson M, Gusev A, Palamara PF. Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations. Nat Commun 2020; 11:6130. [PMID: 33257650 PMCID: PMC7704644 DOI: 10.1038/s41467-020-19588-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 10/02/2020] [Indexed: 12/14/2022] Open
Abstract
Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. We develop FastSMC, an IBD detection algorithm that combines a fast heuristic search with accurate coalescent-based likelihood calculations. FastSMC enables biobank-scale detection and dating of IBD segments within several thousands of years in the past. We apply FastSMC to 487,409 UK Biobank samples and detect ~214 billion IBD segments transmitted by shared ancestors within the past 1500 years, obtaining a fine-grained picture of genetic relatedness in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the use of genomic data to localize a sample's birth coordinates with a median error of 45 km. We seek evidence of recent positive selection by identifying loci with unusually strong shared ancestry and detect 12 genome-wide significant signals. We devise an IBD-based test for association between phenotype and ultra-rare loss-of-function variation, identifying 29 association signals in 7 blood-related traits.
Collapse
Affiliation(s)
| | | | - Derek Shyr
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Fergus Cooper
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Martin Robinson
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Alexander Gusev
- Brigham & Women's Hospital, Division of Genetics, Boston, MA, 02215, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
18
|
Browning SR, Browning BL. Probabilistic Estimation of Identity by Descent Segment Endpoints and Detection of Recent Selection. Am J Hum Genet 2020; 107:895-910. [PMID: 33053335 PMCID: PMC7553009 DOI: 10.1016/j.ajhg.2020.09.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 09/25/2020] [Indexed: 12/18/2022] Open
Abstract
Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
19
|
Samuels DC, Below JE, Ness S, Yu H, Leng S, Guo Y. Alternative Applications of Genotyping Array Data Using Multivariant Methods. Trends Genet 2020; 36:857-867. [PMID: 32773169 PMCID: PMC7572808 DOI: 10.1016/j.tig.2020.07.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 07/08/2020] [Accepted: 07/09/2020] [Indexed: 10/23/2022]
Abstract
One of the forerunners that pioneered the revolution of high-throughput genomic technologies is the genotyping microarray technology, which can genotype millions of single-nucleotide variants simultaneously. Owing to apparent benefits, such as high speed, low cost, and high throughput, the genotyping array has gained lasting applications in genome-wide association studies (GWAS) and thus accumulated an enormous amount of data. Empowered by continuous manufactural upgrades and analytical innovation, unconventional applications of genotyping array data have emerged to address more diverse genetic problems, holding promise of boosting genetic research into human diseases through the re-mining of the rich accumulated data. Here, we review several unconventional genotyping array analysis techniques that have been built on the idea of large-scale multivariant analysis and provide empirical application examples. These unconventional outcomes of genotyping arrays include polygenic score, runs of homozygosity (ROH)/heterozygosity ratio, distant pedigree computation, and mitochondrial DNA (mtDNA) copy number inference.
Collapse
Affiliation(s)
- David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37232, USA
| | - Jennifer E Below
- Devision of Genetic Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Scott Ness
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Hui Yu
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Shuguang Leng
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Yan Guo
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA.
| |
Collapse
|
20
|
Marian AJ. Clinical Interpretation and Management of Genetic Variants. ACTA ACUST UNITED AC 2020; 5:1029-1042. [PMID: 33145465 PMCID: PMC7591931 DOI: 10.1016/j.jacbts.2020.05.013] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 05/22/2020] [Accepted: 05/27/2020] [Indexed: 01/31/2023]
Abstract
The human genome contains approximately 4 million variants, whose population frequencies vary according to the ethnic backgrounds. Genetic diversity of humans in part determines interindividual variability in susceptibility to diseases, response to therapy, and the clinical outcomes. Genetic variants exert a gradient of biological and clinical effect sizes. In general, variants with the largest effect sizes are responsible for the single-gene disorders, whereas those with moderate and modest effect sizes are responsible for oligogenic and polygenic diseases, respectively. A phenotype is the consequence of nonlinear stochastic interactions among multiple genetic and nongenetic determinants. Discerning pathogenicity of the genetic variants, identified through genetic testing, in the clinical phenotype is challenging and requires complementary expertise in human molecular genetics and clinical medicine.
Genetic variants are major determinants of susceptibility to disease, response to therapy, and clinical outcomes. Advances in the short-read sequencing technologies, despite some shortcomings, have enabled identification of the vast majority of the genetic variants in each genome. The major challenge is in identifying the pathogenic variants in cardiovascular diseases. The yield of the genetic testing has been limited because of technological shortcomings and our incomplete understanding of the genetic basis of cardiovascular disorders. To advance the field, a shift to long-read sequencing platforms is necessary. In addition, to discern the pathogenic variants, genetic diseases should be considered as a continuum and the genetic variants as probabilistic factors with a gradient of effect sizes. Moreover, disease-specific physician-scientists with expertise in the clinical medicine and molecular genetics are best equipped to discern functional and clinical significance of the genetic variants. The changes would be expected to enhance clinical utilities of the genetic discoveries.
Collapse
Affiliation(s)
- Ali J Marian
- Center for Cardiovascular Genetics, Institute of Molecular Medicine and Department of Medicine, University of Texas Health Sciences Center at Houston, Houston, Texas
| |
Collapse
|
21
|
Zhou Y, Browning BL, Browning SR. Population-Specific Recombination Maps from Segments of Identity by Descent. Am J Hum Genet 2020; 107:137-148. [PMID: 32533945 PMCID: PMC7332656 DOI: 10.1016/j.ajhg.2020.05.016] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 05/20/2020] [Indexed: 12/26/2022] Open
Abstract
Recombination rates vary significantly across the genome, and estimates of recombination rates are needed for downstream analyses such as haplotype phasing and genotype imputation. Existing methods for recombination rate estimation are limited by insufficient amounts of informative genetic data or by high computational cost. We present a method and software, called IBDrecomb, for using segments of identity by descent to infer recombination rates. IBDrecomb can be applied to sequenced population cohorts to obtain high-resolution, population-specific recombination maps. In simulated admixed data, IBDrecomb obtains higher accuracy than admixture-based estimation of recombination rates. When applied to 2,500 simulated individuals, IBDrecomb obtains similar accuracy to a linkage-disequilibrium (LD)-based method applied to 96 individuals (the largest number for which computation is tractable). Compared to LD-based maps, our IBD-based maps have the advantage of estimating recombination rates in the recent past rather than the distant past. We used IBDrecomb to generate new recombination maps for European Americans and for African Americans from TOPMed sequence data from the Framingham Heart Study (1,626 unrelated individuals) and the Jackson Heart Study (2,046 unrelated individuals), and we compare them to LD-based, admixture-based, and family-based maps.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Brian L Browning
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
22
|
Seidman DN, Shenoy SA, Kim M, Babu R, Woods IG, Dyer TD, Lehman DM, Curran JE, Duggirala R, Blangero J, Williams AL. Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification. Am J Hum Genet 2020; 106:453-466. [PMID: 32197076 PMCID: PMC7118564 DOI: 10.1016/j.ajhg.2020.02.012] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 02/18/2020] [Indexed: 01/29/2023] Open
Abstract
Identity-by-descent (IBD) segments are a useful tool for applications ranging from demographic inference to relationship classification, but most detection methods rely on phasing information and therefore require substantial computation time. As genetic datasets grow, methods for inferring IBD segments that scale well will be critical. We developed IBIS, an IBD detector that locates long regions of allele sharing between unphased individuals, and benchmarked it with Refined IBD, GERMLINE, and TRUFFLE on 3,000 simulated individuals. Phasing these with Beagle 5 takes 4.3 CPU days, followed by either Refined IBD or GERMLINE segment detection in 2.9 or 1.1 h, respectively. By comparison, IBIS finishes in 6.8 min or 7.8 min with IBD2 functionality enabled: speedups of 805-946× including phasing time. TRUFFLE takes 2.6 h, corresponding to IBIS speedups of 20.2-23.3×. IBIS is also accurate, inferring ≥7 cM IBD segments at quality comparable to Refined IBD and GERMLINE. With these segments, IBIS classifies first through third degree relatives in real Mexican American samples at rates meeting or exceeding other methods tested and identifies fourth through sixth degree pairs at rates within 0.0%-2.0% of the top method. While allele frequency-based approaches that do not detect segments can infer relationship degrees faster than IBIS, the fastest are biased in admixed samples, with KING inferring 30.8% fewer fifth degree Mexican American relatives correctly compared with IBIS. Finally, we ran IBIS on chromosome 2 of the UK Biobank dataset and estimate its runtime on the autosomes to be 3.3 days parallelized across 128 cores.
Collapse
Affiliation(s)
- Daniel N Seidman
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Sushila A Shenoy
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Minsoo Kim
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Ramya Babu
- Department of Computer Science, Cornell University, Ithaca, NY 14853, USA
| | - Ian G Woods
- Department of Biology, Ithaca College, Ithaca, NY 14850, USA
| | - Thomas D Dyer
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Donna M Lehman
- Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Joanne E Curran
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Ravindranath Duggirala
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Amy L Williams
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
23
|
Zhou Y, Browning SR, Browning BL. A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data. Am J Hum Genet 2020; 106:426-437. [PMID: 32169169 PMCID: PMC7118582 DOI: 10.1016/j.ajhg.2020.02.010] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/12/2020] [Indexed: 12/24/2022] Open
Abstract
Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
24
|
Kessler MD, Loesch DP, Perry JA, Heard-Costa NL, Taliun D, Cade BE, Wang H, Daya M, Ziniti J, Datta S, Celedón JC, Soto-Quiros ME, Avila L, Weiss ST, Barnes K, Redline SS, Vasan RS, Johnson AD, Mathias RA, Hernandez R, Wilson JG, Nickerson DA, Abecasis G, Browning SR, Zöllner S, O'Connell JR, Mitchell BD, O'Connor TD. De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population. Proc Natl Acad Sci U S A 2020; 117:2560-2569. [PMID: 31964835 PMCID: PMC7007577 DOI: 10.1073/pnas.1902766117] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
De novo mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole-genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) Program, we called 93,325 single-nucleotide DNMs across 1,465 trios from an array of diverse human populations, and used them to directly estimate and analyze DNM counts, rates, and spectra. We find a significant positive correlation between local recombination rate and local DNM rate, and that DNM rate explains a substantial portion (8.98 to 34.92%, depending on the model) of the genome-wide variation in population-level genetic variation from 41K unrelated TOPMed samples. Genome-wide heterozygosity does correlate with DNM rate, but only explains <1% of variation. While we are underpowered to see small differences, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, we did find significantly fewer DNMs in Amish individuals, even when compared with other Europeans, and even after accounting for parental age and sequencing center. Specifically, we found significant reductions in the number of C→A and T→C mutations in the Amish, which seem to underpin their overall reduction in DNMs. Finally, we calculated near-zero estimates of narrow sense heritability (h2), which suggest that variation in DNM rate is significantly shaped by nonadditive genetic effects and the environment.
Collapse
Affiliation(s)
- Michael D Kessler
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- University of Maryland Marlene and Stewart Greenebaum Comprehensive Cancer Center, University of Maryland School of Medicine, Baltimore, MD 21201
| | - Douglas P Loesch
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
| | - James A Perry
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
| | - Nancy L Heard-Costa
- Department of Neurology, Boston University School of Medicine, Boston, MA 02118
- Framingham Heart Study, Framingham, MA 01702
| | - Daniel Taliun
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Brian E Cade
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142
| | - Heming Wang
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142
| | - Michelle Daya
- Department of Medicine, University of Colorado Denver, Aurora, CO 80045
| | - John Ziniti
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115
| | - Soma Datta
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115
| | - Juan C Celedón
- Division of Pediatric Pulmonary Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213
| | - Manuel E Soto-Quiros
- Department of Pediatrics, Hospital Nacional de Niños, 10103 San José, Costa Rica
| | - Lydiana Avila
- Department of Pediatrics, Hospital Nacional de Niños, 10103 San José, Costa Rica
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115
- Department of Medicine, Harvard Medical School, Boston, MA 02115
| | - Kathleen Barnes
- Department of Medicine, University of Colorado Denver, Aurora, CO 80045
| | - Susan S Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115
- Division of Sleep Medicine, Harvard Medical School, Boston, MA 02115
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215
| | | | - Andrew D Johnson
- Framingham Heart Study, Framingham, MA 01702
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702
| | - Rasika A Mathias
- Division of Allergy and Clinical Immunology, The Johns Hopkins School of Medicine, Baltimore, MD 21224
- Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, MD 21218
| | - Ryan Hernandez
- Quantitative Life Sciences, McGill University, Montreal, QC H3A OG4, Canada
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS 39216
| | | | - Goncalo Abecasis
- School of Public Health, University of Michigan, Ann Arbor, MI 48109
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109
| | - Jeffrey R O'Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Geriatrics Research and Education Clinical Center, Baltimore Veterans Administration Medical Center, Baltimore, MD 21201
| | - Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201;
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- University of Maryland Marlene and Stewart Greenebaum Comprehensive Cancer Center, University of Maryland School of Medicine, Baltimore, MD 21201
| |
Collapse
|
25
|
Tian X, Browning BL, Browning SR. Estimating the Genome-wide Mutation Rate with Three-Way Identity by Descent. Am J Hum Genet 2019; 105:883-893. [PMID: 31587867 DOI: 10.1016/j.ajhg.2019.09.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 09/09/2019] [Indexed: 12/20/2022] Open
Abstract
The two primary methods for estimating the genome-wide mutation rate have been counting de novo mutations in parent-offspring trios and comparing sequence data between closely related species. With parent-offspring trio analysis it is difficult to control for genotype error, and resolution is limited because each trio provides information from only two meioses. Inter-species comparison is difficult to calibrate due to uncertainty in the number of meioses separating species, and it can be biased by selection and by changing mutation rates over time. An alternative class of approaches for estimating mutation rates that avoids these limitations is based on identity by descent (IBD) segments that arise from common ancestry within the past few thousand years. Existing IBD-based methods are limited to highly inbred samples, or lack robustness to genotype error and error in the estimated demographic history. We present an IBD-based method that uses sharing of IBD segments among sets of three individuals to estimate the mutation rate. Our method is applicable to accurately phased genotype data, such as parent-offspring trio data phased using Mendelian rules of inheritance. Unlike standard parent-offspring analysis, our method utilizes distant relationships and is robust to genotype error. We apply our method to data from 1,307 European-ancestry individuals in the Framingham Heart Study sequenced by the NHLBI TOPMed project. We obtain an estimate of 1.29 × 10-8 mutations per base pair per meiosis with a 95% confidence interval of [1.02 × 10-8, 1.56 × 10-8].
Collapse
|
26
|
Fisher E, Schweiger R, Rosset S. Efficient Construction of Test Inversion Confidence Intervals Using Quantile Regression. J Comput Graph Stat 2019. [DOI: 10.1080/10618600.2019.1647215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Eyal Fisher
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, United Kingdom
| | - Regev Schweiger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Saharon Rosset
- Department of Statistics, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
27
|
Abstract
Mutation provides the ultimate source of all new alleles in populations, including variants that cause disease and fuel adaptation. Recent whole genome sequencing studies have uncovered variation in the mutation rate among individuals and differences in the relative frequency of specific nucleotide changes (the mutation spectrum) between populations. Although parental age is a major driver of differences in overall mutation rate among individuals, the causes of variation in the mutation spectrum remain less well understood. Here, I use high-quality whole genome sequences from 29 inbred laboratory mouse strains to explore the root causes of strain variation in the mutation spectrum. My analysis leverages the unique, mosaic patterns of genetic relatedness among inbred mouse strains to identify strain private variants residing on haplotypes shared between multiple strains due to their recent descent from a common ancestor. I show that these strain-private alleles are strongly enriched for recent de novo mutations and lack signals of widespread purifying selection, suggesting their faithful recapitulation of the spontaneous mutation landscape in single strains. The spectrum of strain-private variants varies significantly among inbred mouse strains reared under standardized laboratory conditions. This variation is not solely explained by strain differences in age at reproduction, raising the possibility that segregating genetic differences affect the constellation of new mutations that arise in a given strain. Collectively, these findings imply the action of remarkably precise nucleotide-specific genetic mechanisms for tuning the de novo mutation landscape in mammals and underscore the genetic complexity of mutation rate control.
Collapse
|
28
|
Henn BM, Steele TE, Weaver TD. Clarifying distinct models of modern human origins in Africa. Curr Opin Genet Dev 2018; 53:148-156. [PMID: 30423527 DOI: 10.1016/j.gde.2018.10.003] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 10/09/2018] [Accepted: 10/15/2018] [Indexed: 11/29/2022]
Abstract
Accumulating genomic, fossil and archaeological data from Africa have led to a renewed interest in models of modern human origins. However, such discussions are often discipline-specific, with limited integration of evidence across the different fields. Further, geneticists typically require explicit specification of parameters to test competing demographic models, but these have been poorly outlined for some scenarios. Here, we describe four possible models for the origins of Homo sapiens in Africa based on published literature from paleoanthropology and human genetics. We briefly outline expectations for data patterns under each model, with a special focus on genetic data. Additionally, we present schematics for each model, doing our best to qualitatively describe demographic histories for which genetic parameters can be specifically attached. Finally, it is our hope that this perspective provides context for discussions of human origins in other manuscripts presented in this special issue.
Collapse
Affiliation(s)
- Brenna M Henn
- Department of Anthropology, University of California, Davis, CA, 95616, United States; UC Davis Genome Center, University of California, Davis, CA, 95616, United States.
| | - Teresa E Steele
- Department of Anthropology, University of California, Davis, CA, 95616, United States
| | - Timothy D Weaver
- Department of Anthropology, University of California, Davis, CA, 95616, United States
| |
Collapse
|
29
|
High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat Genet 2018; 50:1311-1317. [PMID: 30104759 PMCID: PMC6145075 DOI: 10.1038/s41588-018-0177-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 06/21/2018] [Indexed: 12/19/2022]
Abstract
Interest in reconstructing demographic histories has motivated the development of methods to estimate locus-specific pairwise coalescence times from whole-genome sequence data. Here we introduce a powerful new method, ASMC, that can estimate coalescence times using only SNP array data, and is orders of magnitude faster than previous approaches. We applied ASMC to detect recent positive selection in 113,851 phased British samples from the UK Biobank, and detected 12 genome-wide significant signals, including 6 novel loci. We also applied ASMC to sequencing data from 498 Dutch individuals to detect background selection at deeper time scales. We detected strong heritability enrichment in regions of high background selection in an analysis of 20 independent diseases and complex traits using stratified LD score regression, conditioned on a broad set of functional annotations (including other background selection annotations). These results underscore the widespread effects of background selection on the genetic architecture of complex traits.
Collapse
|
30
|
Torres R, Szpiech ZA, Hernandez RD. Human demographic history has amplified the effects of background selection across the genome. PLoS Genet 2018; 14:e1007387. [PMID: 29912945 PMCID: PMC6056204 DOI: 10.1371/journal.pgen.1007387] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Revised: 07/23/2018] [Accepted: 04/30/2018] [Indexed: 01/22/2023] Open
Abstract
Natural populations often grow, shrink, and migrate over time. Such demographic processes can affect genome-wide levels of genetic diversity. Additionally, genetic variation in functional regions of the genome can be altered by natural selection, which drives adaptive mutations to higher frequencies or purges deleterious ones. Such selective processes affect not only the sites directly under selection but also nearby neutral variation through genetic linkage via processes referred to as genetic hitchhiking in the context of positive selection and background selection (BGS) in the context of purifying selection. While there is extensive literature examining the consequences of selection at linked sites at demographic equilibrium, less is known about how non-equilibrium demographic processes influence the effects of hitchhiking and BGS. Utilizing a global sample of human whole-genome sequences from the Thousand Genomes Project and extensive simulations, we investigate how non-equilibrium demographic processes magnify and dampen the consequences of selection at linked sites across the human genome. When binning the genome by inferred strength of BGS, we observe that, compared to Africans, non-African populations have experienced larger proportional decreases in neutral genetic diversity in strong BGS regions. We replicate these findings in admixed populations by showing that non-African ancestral components of the genome have also been affected more severely in these regions. We attribute these differences to the strong, sustained/recurrent population bottlenecks that non-Africans experienced as they migrated out of Africa and throughout the globe. Furthermore, we observe a strong correlation between FST and the inferred strength of BGS, suggesting a stronger rate of genetic drift. Forward simulations of human demographic history with a model of BGS support these observations. Our results show that non-equilibrium demography significantly alters the consequences of selection at linked sites and support the need for more work investigating the dynamic process of multiple evolutionary forces operating in concert.
Collapse
Affiliation(s)
- Raul Torres
- Biomedical Sciences Graduate Program, University of California San Francisco, San Francisco, CA, United States of America
| | - Zachary A. Szpiech
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, United States of America
| | - Ryan D. Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, United States of America
- Institute for Computational Health Sciences, University of California San Francisco, San Francisco, CA, United States of America
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, United States of America
- * E-mail:
| |
Collapse
|
31
|
Guo J, Wu Y, Zhu Z, Zheng Z, Trzaskowski M, Zeng J, Robinson MR, Visscher PM, Yang J. Global genetic differentiation of complex traits shaped by natural selection in humans. Nat Commun 2018; 9:1865. [PMID: 29760457 PMCID: PMC5951811 DOI: 10.1038/s41467-018-04191-y] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Accepted: 04/12/2018] [Indexed: 11/09/2022] Open
Abstract
There are mean differences in complex traits among global human populations. We hypothesize that part of the phenotypic differentiation is due to natural selection. To address this hypothesis, we assess the differentiation in allele frequencies of trait-associated SNPs among African, Eastern Asian, and European populations for ten complex traits using data of large sample size (up to ~405,000). We show that SNPs associated with height ([Formula: see text]), waist-to-hip ratio ([Formula: see text]), and schizophrenia ([Formula: see text]) are significantly more differentiated among populations than matched "control" SNPs, suggesting that these trait-associated SNPs have undergone natural selection. We further find that SNPs associated with height ([Formula: see text]) and schizophrenia ([Formula: see text]) show significantly higher variance in linkage disequilibrium (LD) scores across populations than control SNPs. Our results support the hypothesis that natural selection has shaped the genetic differentiation of complex traits, such as height and schizophrenia, among worldwide populations.
Collapse
Affiliation(s)
- Jing Guo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Yang Wu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Zhihong Zhu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Zhili Zheng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,The Eye Hospital, School of Ophthalmology and Optometry, Wenzhou Medical University, 325027, Zhejiang, China
| | - Maciej Trzaskowski
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Matthew R Robinson
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,Department of Computational Biology, University of Lausanne, 1011, Lausanne, Switzerland
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,Queensland Brain Institute, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Jian Yang
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia. .,Queensland Brain Institute, The University of Queensland, Brisbane, QLD, 4072, Australia.
| |
Collapse
|
32
|
de Barros Damgaard P, Martiniano R, Kamm J, Moreno-Mayar JV, Kroonen G, Peyrot M, Barjamovic G, Rasmussen S, Zacho C, Baimukhanov N, Zaibert V, Merz V, Biddanda A, Merz I, Loman V, Evdokimov V, Usmanova E, Hemphill B, Seguin-Orlando A, Yediay FE, Ullah I, Sjögren KG, Iversen KH, Choin J, de la Fuente C, Ilardo M, Schroeder H, Moiseyev V, Gromov A, Polyakov A, Omura S, Senyurt SY, Ahmad H, McKenzie C, Margaryan A, Hameed A, Samad A, Gul N, Khokhar MH, Goriunova OI, Bazaliiskii VI, Novembre J, Weber AW, Orlando L, Allentoft ME, Nielsen R, Kristiansen K, Sikora M, Outram AK, Durbin R, Willerslev E. The first horse herders and the impact of early Bronze Age steppe expansions into Asia. Science 2018; 360:science.aar7711. [PMID: 29743352 DOI: 10.1126/science.aar7711] [Citation(s) in RCA: 182] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 05/02/2018] [Indexed: 12/16/2022]
Abstract
The Yamnaya expansions from the western steppe into Europe and Asia during the Early Bronze Age (~3000 BCE) are believed to have brought with them Indo-European languages and possibly horse husbandry. We analyzed 74 ancient whole-genome sequences from across Inner Asia and Anatolia and show that the Botai people associated with the earliest horse husbandry derived from a hunter-gatherer population deeply diverged from the Yamnaya. Our results also suggest distinct migrations bringing West Eurasian ancestry into South Asia before and after, but not at the time of, Yamnaya culture. We find no evidence of steppe ancestry in Bronze Age Anatolia from when Indo-European languages are attested there. Thus, in contrast to Europe, Early Bronze Age Yamnaya-related migrations had limited direct genetic impact in Asia.
Collapse
Affiliation(s)
| | - Rui Martiniano
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK.,Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| | - Jack Kamm
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - J Víctor Moreno-Mayar
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark
| | - Guus Kroonen
- Department of Nordic Studies and Linguistics, University of Copenhagen, Copenhagen, Denmark.,Leiden University Centre for Linguistics, Leiden University, Leiden, Netherlands
| | - Michaël Peyrot
- Leiden University Centre for Linguistics, Leiden University, Leiden, Netherlands
| | - Gojko Barjamovic
- Department of Near Eastern Languages and Civilizations, Harvard University, Cambridge, MA, USA
| | - Simon Rasmussen
- Department of Bio and Health Informatics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Claus Zacho
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark
| | | | - Victor Zaibert
- Institute of Archaeology and Steppe Civilization, Al-Farabi Kazakh National University, Almaty, 050040, Kazakhstan
| | - Victor Merz
- S. Toraighyrov Pavlodar State University, Joint Research Center for Archeological Studies named after A.Kh. Margulan, Pavlodar, Kazakhstan
| | - Arjun Biddanda
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Ilja Merz
- S. Toraighyrov Pavlodar State University, Joint Research Center for Archeological Studies named after A.Kh. Margulan, Pavlodar, Kazakhstan
| | - Valeriy Loman
- Saryarkinsky Institute of Archaeology, Buketov Karaganda State University, Karaganda. 100074, Kazakhstan
| | - Valeriy Evdokimov
- Saryarkinsky Institute of Archaeology, Buketov Karaganda State University, Karaganda. 100074, Kazakhstan
| | - Emma Usmanova
- Saryarkinsky Institute of Archaeology, Buketov Karaganda State University, Karaganda. 100074, Kazakhstan
| | - Brian Hemphill
- Department of Anthropology, University of Alaska, Fairbanks, AK, USA
| | - Andaine Seguin-Orlando
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark
| | - Fulya Eylem Yediay
- The Institute of Forensic Sciences, Istanbul University, Istanbul, Turkey
| | - Inam Ullah
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark.,Department of Genetics, Hazara University, Garden Campus, Mansehra, Pakistan
| | - Karl-Göran Sjögren
- Department of Historical Studies, University of Gothenburg, 40530 Göteborg, Sweden
| | - Katrine Højholt Iversen
- Department of Bio and Health Informatics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Jeremy Choin
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark
| | - Constanza de la Fuente
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark
| | - Melissa Ilardo
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark
| | - Hannes Schroeder
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark
| | - Vyacheslav Moiseyev
- Peter the Great Museum of Anthropology and Ethnography (Kunstkamera) RAS, St. Petersburg, Russia
| | - Andrey Gromov
- Peter the Great Museum of Anthropology and Ethnography (Kunstkamera) RAS, St. Petersburg, Russia
| | - Andrei Polyakov
- Institute for the History of Material Culture, Russian Academy of Sciences, St. Petersburg, Russia
| | - Sachihiro Omura
- Japanese Institute of Anatolian Archaeology, Kaman, Kırşehir, Turkey
| | | | - Habib Ahmad
- Department of Genetics, Hazara University, Garden Campus, Mansehra, Pakistan.,Center of Omic Sciences, Islamia College, Peshawar, Pakistan
| | - Catriona McKenzie
- Department of Archaeology, University of Exeter, Exeter, EX4 4QE, UK
| | - Ashot Margaryan
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark
| | - Abdul Hameed
- Department of Archeology, Hazara University, Garden Campus, Mansehra, Pakistan
| | - Abdul Samad
- Directorate of Archaeology and Museums Government of Khyber Pakhtunkhwa, Pakistan
| | - Nazish Gul
- Department of Genetics, Hazara University, Garden Campus, Mansehra, Pakistan
| | | | - O I Goriunova
- Institute of Archaeology and Ethnography, Siberian Branch of the Russian Academy of Sciences, Academician Lavrent'iev Ave. 17, Novosibirsk, 630090, Russia.,Department of History, Irkutsk State University, Karl Marx Street 1, Irkutsk 664003, Russia
| | - Vladimir I Bazaliiskii
- Department of History, Irkutsk State University, Karl Marx Street 1, Irkutsk 664003, Russia
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.,Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Andrzej W Weber
- Department of Anthropology, University of Alberta, Edmonton, Alberta, T6G 2H4, Canada
| | - Ludovic Orlando
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark.,Laboratoire d'Anthropobiologie Moléculaire et d'Imagerie de Synthèse, CNRS UMR 5288, Université deToulouse, Université Paul Sabatier, 31000 Toulouse, France
| | - Morten E Allentoft
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of Berkeley, Berkeley, CA, USA
| | - Kristian Kristiansen
- Department of Historical Studies, University of Gothenburg, 40530 Göteborg, Sweden
| | - Martin Sikora
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark
| | - Alan K Outram
- Department of Archaeology, University of Exeter, Exeter, EX4 4QE, UK
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK. .,Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| | - Eske Willerslev
- Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark. .,Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK.,Department of Zoology, University of Cambridge, Cambridge, UK
| |
Collapse
|
33
|
High-depth whole genome sequencing of an Ashkenazi Jewish reference panel: enhancing sensitivity, accuracy, and imputation. Hum Genet 2018; 137:343-355. [PMID: 29705978 DOI: 10.1007/s00439-018-1886-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2018] [Accepted: 04/21/2018] [Indexed: 12/31/2022]
Abstract
While increasingly large reference panels for genome-wide imputation have been recently made available, the degree to which imputation accuracy can be enhanced by population-specific reference panels remains an open question. Here, we sequenced at full-depth (≥ 30×), across two platforms (Illumina X Ten and Complete Genomics, Inc.), a moderately large (n = 738) cohort of samples drawn from the Ashkenazi Jewish population. We developed a series of quality control steps to optimize sensitivity, specificity, and comprehensiveness of variant calls in the reference panel, and then tested the accuracy of imputation against target cohorts drawn from the same population. Quality control (QC) thresholds for the Illumina X Ten platform were identified that permitted highly accurate calling of single nucleotide variants across 94% of the genome. QC procedures also identified numerous regions that are poorly mapped using current reference or alternate assemblies. After stringent QC, the population-specific reference panel produced more accurate and comprehensive imputation results relative to publicly available, large cosmopolitan reference panels, especially in the range of rare variants that may be most critical to further progress in mapping of complex phenotypes. The population-specific reference panel also permitted enhanced filtering of clinically irrelevant variants from personal genomes.
Collapse
|
34
|
Zeng J, de Vlaming R, Wu Y, Robinson MR, Lloyd-Jones LR, Yengo L, Yap CX, Xue A, Sidorenko J, McRae AF, Powell JE, Montgomery GW, Metspalu A, Esko T, Gibson G, Wray NR, Visscher PM, Yang J. Signatures of negative selection in the genetic architecture of human complex traits. Nat Genet 2018; 50:746-753. [PMID: 29662166 DOI: 10.1038/s41588-018-0101-4] [Citation(s) in RCA: 195] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Accepted: 03/05/2018] [Indexed: 11/09/2022]
Abstract
We develop a Bayesian mixed linear model that simultaneously estimates single-nucleotide polymorphism (SNP)-based heritability, polygenicity (proportion of SNPs with nonzero effects), and the relationship between SNP effect size and minor allele frequency for complex traits in conventionally unrelated individuals using genome-wide SNP data. We apply the method to 28 complex traits in the UK Biobank data (N = 126,752) and show that on average, 6% of SNPs have nonzero effects, which in total explain 22% of phenotypic variance. We detect significant (P < 0.05/28) signatures of natural selection in the genetic architecture of 23 traits, including reproductive, cardiovascular, and anthropometric traits, as well as educational attainment. The significant estimates of the relationship between effect size and minor allele frequency in complex traits are consistent with a model of negative (or purifying) selection, as confirmed by forward simulation. We conclude that negative selection acts pervasively on the genetic variants associated with human complex traits.
Collapse
Affiliation(s)
- Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Ronald de Vlaming
- School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.,Erasmus University Rotterdam Institute for Behavior and Biology, Rotterdam, The Netherlands
| | - Yang Wu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Matthew R Robinson
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Luke R Lloyd-Jones
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Loic Yengo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Chloe X Yap
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Angli Xue
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Julia Sidorenko
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia.,Estonian Genome Center, University of Tartu, Tartu, Estonia
| | - Allan F McRae
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Joseph E Powell
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Grant W Montgomery
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | | | - Tonu Esko
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | - Greg Gibson
- School of Biological Sciences and Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, GA, USA
| | - Naomi R Wray
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia.,Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia.,Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia
| | - Jian Yang
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia. .,Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
35
|
Jagadeesan A, Gunnarsdóttir ED, Ebenesersdóttir SS, Guðmundsdóttir VB, Thordardottir EL, Einarsdóttir MS, Jónsson H, Dugoujon JM, Fortes-Lima C, Migot-Nabias F, Massougbodji A, Bellis G, Pereira L, Másson G, Kong A, Stefánsson K, Helgason A. Reconstructing an African haploid genome from the 18th century. Nat Genet 2018; 50:199-205. [PMID: 29335549 DOI: 10.1038/s41588-017-0031-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 12/18/2017] [Indexed: 01/15/2023]
Abstract
A genome is a mosaic of chromosome fragments from ancestors who existed some arbitrary number of generations earlier. Here, we reconstruct the genome of Hans Jonatan (HJ), born in the Caribbean in 1784 to an enslaved African mother and European father. HJ migrated to Iceland in 1802, married and had two children. We genotyped 182 of his 788 descendants using single-nucleotide polymorphism (SNP) chips and whole-genome sequenced (WGS) 20 of them. Using these data, we reconstructed 38% of HJ's maternal genome and inferred that his mother was from the region spanned by Benin, Nigeria and Cameroon.
Collapse
Affiliation(s)
- Anuradha Jagadeesan
- deCODE Genetics/Amgen, Reykjavik, Iceland
- Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | | | - S Sunna Ebenesersdóttir
- deCODE Genetics/Amgen, Reykjavik, Iceland
- Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | - Valdis B Guðmundsdóttir
- deCODE Genetics/Amgen, Reykjavik, Iceland
- Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | | | - Margrét S Einarsdóttir
- deCODE Genetics/Amgen, Reykjavik, Iceland
- Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | | | - Jean-Michel Dugoujon
- Laboratoire d'Eco-Anthropologie et Ethnobiologie, Equipe d'Anthropologie Evolutive, UMR 7206, Centre National de la Recherche Scientifique (CNRS) et Université Diderot Paris 7, Paris, France
| | - Cesar Fortes-Lima
- Laboratoire d'Eco-Anthropologie et Ethnobiologie, Equipe d'Anthropologie Evolutive, UMR 7206, Centre National de la Recherche Scientifique (CNRS) et Université Diderot Paris 7, Paris, France
| | - Florence Migot-Nabias
- Institut de Recherche pour le Développement, UMR D216 MERIT (Mère et enfant face aux infections tropicales), Paris, France
- COMUE Sorbonne Paris Cité, Faculté de Pharmacie, Université Paris Descartes, Paris, France
| | - Achille Massougbodji
- Centre d'Etude et de Recherche sur le Paludisme Associé à la Grossesse et l'Enfance (CERPAGE), Cotonou, Benin
- Laboratoire de Parasitologie, Faculté des Sciences de la Santé, Université d'Abomey-Calavi, Cotonou, Benin
| | - Gil Bellis
- Institut National d'Etudes Démographiques (INED), Paris, France
| | - Luisa Pereira
- Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, Porto, Portugal
- Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP), Porto, Portugal
- Faculdade de Medicina da Universidade do Porto, Porto, Portugal
| | | | | | - Kári Stefánsson
- deCODE Genetics/Amgen, Reykjavik, Iceland.
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland.
| | - Agnar Helgason
- deCODE Genetics/Amgen, Reykjavik, Iceland.
- Department of Anthropology, University of Iceland, Reykjavik, Iceland.
| |
Collapse
|
36
|
Apostolou M, Shialos M, Kyrou E, Demetriou A, Papamichael A. The challenge of starting and keeping a relationship: Prevalence rates and predictors of poor mating performance. PERSONALITY AND INDIVIDUAL DIFFERENCES 2018. [DOI: 10.1016/j.paid.2017.10.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
37
|
Tatsumoto S, Go Y, Fukuta K, Noguchi H, Hayakawa T, Tomonaga M, Hirai H, Matsuzawa T, Agata K, Fujiyama A. Direct estimation of de novo mutation rates in a chimpanzee parent-offspring trio by ultra-deep whole genome sequencing. Sci Rep 2017; 7:13561. [PMID: 29093469 PMCID: PMC5666008 DOI: 10.1038/s41598-017-13919-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 10/04/2017] [Indexed: 12/30/2022] Open
Abstract
Mutations generate genetic variation and are a major driving force of evolution. Therefore, examining mutation rates and modes are essential for understanding the genetic basis of the physiology and evolution of organisms. Here, we aim to identify germline de novo mutations through the whole-genome surveyance of Mendelian inheritance error sites (MIEs), those not inherited through the Mendelian inheritance manner from either of the parents, using ultra-deep whole genome sequences (>150-fold) from a chimpanzee parent-offspring trio. We identified such 889 MIEs and classified them into four categories based on the pattern of inheritance and the sequence read depth: [i] de novo single nucleotide variants (SNVs), [ii] copy number neutral inherited variants, [iii] hemizygous deletion inherited variants, and [iv] de novo copy number variants (CNVs). From de novo SNV candidates, we estimated a germline de novo SNV mutation rate as 1.48 × 10-8 per site per generation or 0.62 × 10-9 per site per year. In summary, this study demonstrates the significance of ultra-deep whole genome sequencing not only for the direct estimation of mutation rates but also for discerning various mutation modes including de novo allelic conversion and de novo CNVs by identifying MIEs through the transmission of genomes from parents to offspring.
Collapse
Affiliation(s)
- Shoji Tatsumoto
- Department of Brain Sciences, Center for Novel Science Initiatives, National Institutes of Natural Sciences, Okazaki, Aichi, 444-8585, Japan
| | - Yasuhiro Go
- Department of Brain Sciences, Center for Novel Science Initiatives, National Institutes of Natural Sciences, Okazaki, Aichi, 444-8585, Japan. .,Department of System Neuroscience, National Institute for Physiological Sciences, Okazaki, Aichi, 444-8585, Japan. .,Department of Physiological Sciences, School of Life Science, SOKENDAI (The Graduate University for Advanced Studies), Okazaki, Aichi, 484-8585, Japan.
| | - Kentaro Fukuta
- Center for Genome Informatics, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Mishima, Shizuoka, 411-8540, Japan.,Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Hideki Noguchi
- Center for Genome Informatics, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Mishima, Shizuoka, 411-8540, Japan.,Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Takashi Hayakawa
- Department of Wildlife Science (Nagoya Railroad Co., Ltd.), Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan.,Japan Monkey Centre, Inuyama, Aichi, 484-0081, Japan
| | - Masaki Tomonaga
- Department of Wildlife Science (Nagoya Railroad Co., Ltd.), Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan.,Japan Monkey Centre, Inuyama, Aichi, 484-0081, Japan.,Language and Intelligence Section, Department of Cognitive Sciences, Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan
| | - Hirohisa Hirai
- Molecular Biology Section, Department of Cellular and Molecular Biology, Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan
| | - Tetsuro Matsuzawa
- Department of Wildlife Science (Nagoya Railroad Co., Ltd.), Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan.,Japan Monkey Centre, Inuyama, Aichi, 484-0081, Japan.,Language and Intelligence Section, Department of Cognitive Sciences, Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan.,Institute of Advanced Study, Kyoto University, Kyoto, 606-8501, Japan
| | - Kiyokazu Agata
- Laboratory for Biodiversity, Global COE Program, Graduate School of Science, Kyoto University, Kyoto, 606-8502, Japan.,Laboratory for Molecular Developmental Biology, Graduate School of Science, Kyoto University, Kyoto, 606-8502, Japan.,Graduate Course in Life Science, Gakushuin University, Tokyo, 171-8585, Japan
| | - Asao Fujiyama
- Center for Genome Informatics, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Mishima, Shizuoka, 411-8540, Japan. .,Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan. .,Department of Genetics, School of Life Science, SOKENDAI (The Graduate University for Advanced Studies), Mishima, Shizuoka, 411-8540, Japan.
| |
Collapse
|
38
|
Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat Genet 2017; 49:1421-1427. [PMID: 28892061 DOI: 10.1038/ng.3954] [Citation(s) in RCA: 261] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 08/16/2017] [Indexed: 12/14/2022]
Abstract
Recent work has hinted at the linkage disequilibrium (LD)-dependent architecture of human complex traits, where SNPs with low levels of LD (LLD) have larger per-SNP heritability. Here we analyzed summary statistics from 56 complex traits (average N = 101,401) by extending stratified LD score regression to continuous annotations. We determined that SNPs with low LLD have significantly larger per-SNP heritability and that roughly half of this effect can be explained by functional annotations negatively correlated with LLD, such as DNase I hypersensitivity sites (DHSs). The remaining signal is largely driven by our finding that more recent common variants tend to have lower LLD and to explain more heritability (P = 2.38 × 10-104); the youngest 20% of common SNPs explain 3.9 times more heritability than the oldest 20%, consistent with the action of negative selection. We also inferred jointly significant effects of other LD-related annotations and confirmed via forward simulations that they jointly predict deleterious effects.
Collapse
|
39
|
Narasimhan VM, Rahbari R, Scally A, Wuster A, Mason D, Xue Y, Wright J, Trembath RC, Maher ER, van Heel DA, Auton A, Hurles ME, Tyler-Smith C, Durbin R. Estimating the human mutation rate from autozygous segments reveals population differences in human mutational processes. Nat Commun 2017; 8:303. [PMID: 28827725 PMCID: PMC5566399 DOI: 10.1038/s41467-017-00323-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 06/20/2017] [Indexed: 11/08/2022] Open
Abstract
Heterozygous mutations within homozygous sequences descended from a recent common ancestor offer a way to ascertain de novo mutations across multiple generations. Using exome sequences from 3222 British-Pakistani individuals with high parental relatedness, we estimate a mutation rate of 1.45 ± 0.05 × 10-8 per base pair per generation in autosomal coding sequence, with a corresponding non-crossover gene conversion rate of 8.75 ± 0.05 × 10-6 per base pair per generation. This is at the lower end of exome mutation rates previously estimated in parent-offspring trios, suggesting that post-zygotic mutations contribute little to the human germ-line mutation rate. We find frequent recurrence of mutations at polymorphic CpG sites, and an increase in C to T mutations in a 5' CCG 3' to 5' CTG 3' context in the Pakistani population compared to Europeans, suggesting that mutational processes have evolved rapidly between human populations.Estimates of human mutation rates differ substantially based on the approach. Here, the authors present a multi-generational estimate from the autozygous segment in a non-European population that gives insight into the contribution of post-zygotic mutations and population-specific mutational processes.
Collapse
Affiliation(s)
| | - Raheleh Rahbari
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA UK
| | - Aylwyn Scally
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH UK
| | - Arthur Wuster
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA UK
- Department of Human Genetics and Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA 94080 USA
| | - Dan Mason
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, BD9 6RJ UK
| | - Yali Xue
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA UK
| | - John Wright
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, BD9 6RJ UK
| | - Richard C. Trembath
- Division of Genetics and Molecular Medicine, Faculty of Life Sciences and Medicine, King’s College, London, SE1 1UL UK
| | - Eamonn R. Maher
- Department of Medical Genetics, University of Cambridge, Cambridge, CB2 0QQ UK
- Cambridge NIHR Biomedical Research Centre, Cambridge, CB2 0QQ UK
| | - David A. van Heel
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, E1 2AT UK
| | - Adam Auton
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | | | | | - Richard Durbin
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA UK
| |
Collapse
|
40
|
McManus KF, Taravella AM, Henn BM, Bustamante CD, Sikora M, Cornejo OE. Population genetic analysis of the DARC locus (Duffy) reveals adaptation from standing variation associated with malaria resistance in humans. PLoS Genet 2017; 13:e1006560. [PMID: 28282382 PMCID: PMC5365118 DOI: 10.1371/journal.pgen.1006560] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 03/24/2017] [Accepted: 12/30/2016] [Indexed: 12/22/2022] Open
Abstract
The human DARC (Duffy antigen receptor for chemokines) gene encodes a membrane-bound chemokine receptor crucial for the infection of red blood cells by Plasmodium vivax, a major causative agent of malaria. Of the three major allelic classes segregating in human populations, the FY*O allele has been shown to protect against P. vivax infection and is at near fixation in sub-Saharan Africa, while FY*B and FY*A are common in Europe and Asia, respectively. Due to the combination of strong geographic differentiation and association with malaria resistance, DARC is considered a canonical example of positive selection in humans. Despite this, details of the timing and mode of selection at DARC remain poorly understood. Here, we use sequencing data from over 1,000 individuals in twenty-one human populations, as well as ancient human genomes, to perform a fine-scale investigation of the evolutionary history of DARC. We estimate the time to most recent common ancestor (TMRCA) of the most common FY*O haplotype to be 42 kya (95% CI: 34-49 kya). We infer the FY*O null mutation swept to fixation in Africa from standing variation with very low initial frequency (0.1%) and a selection coefficient of 0.043 (95% CI:0.011-0.18), which is among the strongest estimated in the human genome. We estimate the TMRCA of the FY*A mutation in non-Africans to be 57 kya (95% CI: 48-65 kya) and infer that, prior to the sweep of FY*O, all three alleles were segregating in Africa, as highly diverged populations from Asia and ≠Khomani San hunter-gatherers share the same FY*A haplotypes. We test multiple models of admixture that may account for this observation and reject recent Asian or European admixture as the cause.
Collapse
Affiliation(s)
- Kimberly F. McManus
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Angela M. Taravella
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, United States of America
| | - Brenna M. Henn
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, United States of America
| | - Carlos D. Bustamante
- Department of Biology, Stanford University, Stanford, California, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Martin Sikora
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Centre for Geogenetics, Natural History Museum Denmark, Copenhagen, Denmark
| | - Omar E. Cornejo
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Department of Biological Sciences, Washington State University, Pullman, washington, United States of America
| |
Collapse
|
41
|
Abstract
Our understanding of the chronology of human evolution relies on the “molecular clock” provided by the steady accumulation of substitutions on an evolutionary lineage. Recent analyses of human pedigrees have called this understanding into question by revealing unexpectedly low germline mutation rates, which imply that substitutions accrue more slowly than previously believed. Translating mutation rates estimated from pedigrees into substitution rates is not as straightforward as it may seem, however. We dissect the steps involved, emphasizing that dating evolutionary events requires not “a mutation rate” but a precise characterization of how mutations accumulate in development in males and females—knowledge that remains elusive.
Collapse
Affiliation(s)
- Priya Moorjani
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- * E-mail: (PM); (ZG); (MP)
| | - Ziyue Gao
- Howard Hughes Medical Institute & Dept. of Genetics, Stanford University, Stanford, California, United States of America
- * E-mail: (PM); (ZG); (MP)
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- * E-mail: (PM); (ZG); (MP)
| |
Collapse
|
42
|
Family-Specific Variants and the Limits of Human Genetics. Trends Mol Med 2016; 22:925-934. [PMID: 27742414 DOI: 10.1016/j.molmed.2016.09.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Revised: 09/10/2016] [Accepted: 09/13/2016] [Indexed: 01/28/2023]
Abstract
Every single-nucleotide change compatible with life is present in the human population today. Understanding these rare human variants defines an extraordinary challenge for genetics and medicine. The new clinical practice of sequencing many genes for hereditary cancer risk has illustrated the utility of clinical next-generation sequencing in adults, identifying more medically actionable variants than single-gene testing. However, it has also revealed a linear relationship between the length of DNA evaluated and the number of rare 'variants of uncertain significance' reported. We propose that careful approaches to phenotype-genotype inference, distinguishing between diagnostic and screening intent, in conjunction with expanded use of family-scale genetics studies as a source of information on family-specific variants, will reduce variants of uncertain significance reported to patients.
Collapse
|
43
|
The rate of meiotic gene conversion varies by sex and age. Nat Genet 2016; 48:1377-1384. [PMID: 27643539 PMCID: PMC5083143 DOI: 10.1038/ng.3669] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 08/16/2016] [Indexed: 12/14/2022]
Abstract
Meiotic recombination involves a combination of gene conversion and crossover events that, along with mutations, produce germline genetic diversity. Here we report the discovery of 3,176 SNP and 61 indel gene conversions. Our estimate of the non-crossover (NCO) gene conversion rate (G) is 7.0 for SNPs and 5.8 for indels per megabase per generation, and the GC bias is 67.6%. For indels, we demonstrate a 65.6% preference for the shorter allele. NCO gene conversions from mothers are longer than those from fathers, and G is 2.17 times greater in mothers. Notably, G increases with the age of mothers, but not the age of fathers. A disproportionate number of NCO gene conversions in older mothers occur outside double-strand break (DSB) regions and in regions with relatively low GC content. This points to age-related changes in the mechanisms of meiotic gene conversion in oocytes.
Collapse
|
44
|
Scally A. The mutation rate in human evolution and demographic inference. Curr Opin Genet Dev 2016; 41:36-43. [PMID: 27589081 DOI: 10.1016/j.gde.2016.07.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Revised: 07/07/2016] [Accepted: 07/11/2016] [Indexed: 01/23/2023]
Abstract
The germline mutation rate has long been a major source of uncertainty in human evolutionary and demographic analyses based on genetic data, but estimates have improved substantially in recent years. I discuss our current knowledge of the mutation rate in humans and the underlying biological factors affecting it, which include generation time, parental age and other developmental and reproductive timescales. There is good evidence for a slowdown in mean mutation rate during great ape evolution, but not for a more recent change within the timescale of human genetic diversity. Hence, pending evidence to the contrary, it is reasonable to use a present-day rate of approximately 0.5×10-9bp-1year-1 in all human or hominin demographic analyses.
Collapse
Affiliation(s)
- Aylwyn Scally
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom.
| |
Collapse
|
45
|
Phung TN, Huber CD, Lohmueller KE. Determining the Effect of Natural Selection on Linked Neutral Divergence across Species. PLoS Genet 2016; 12:e1006199. [PMID: 27508305 PMCID: PMC4980041 DOI: 10.1371/journal.pgen.1006199] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 06/25/2016] [Indexed: 11/18/2022] Open
Abstract
A major goal in evolutionary biology is to understand how natural selection has shaped patterns of genetic variation across genomes. Studies in a variety of species have shown that neutral genetic diversity (intra-species differences) has been reduced at sites linked to those under direct selection. However, the effect of linked selection on neutral sequence divergence (inter-species differences) remains ambiguous. While empirical studies have reported correlations between divergence and recombination, which is interpreted as evidence for natural selection reducing linked neutral divergence, theory argues otherwise, especially for species that have diverged long ago. Here we address these outstanding issues by examining whether natural selection can affect divergence between both closely and distantly related species. We show that neutral divergence between closely related species (e.g. human-primate) is negatively correlated with functional content and positively correlated with human recombination rate. We also find that neutral divergence between distantly related species (e.g. human-rodent) is negatively correlated with functional content and positively correlated with estimates of background selection from primates. These patterns persist after accounting for the confounding factors of hypermutable CpG sites, GC content, and biased gene conversion. Coalescent models indicate that even when the contribution of ancestral polymorphism to divergence is small, background selection in the ancestral population can still explain a large proportion of the variance in divergence across the genome, generating the observed correlations. Our findings reveal that, contrary to previous intuition, natural selection can indirectly affect linked neutral divergence between both closely and distantly related species. Though we cannot formally exclude the possibility that the direct effects of purifying selection drive some of these patterns, such a scenario would be possible only if more of the genome is under purifying selection than currently believed. Our work has implications for understanding the evolution of genomes and interpreting patterns of genetic variation. Genetic variation at neutral sites can be reduced through linkage to nearby selected sites. This pattern has been used to show the widespread effects of natural selection at shaping patterns of genetic diversity across genomes from a variety of species. However, it is not entirely clear whether natural selection has an effect on neutral divergence between species. Here we show that putatively neutral divergence between closely related species (human and chimp) and between distantly related pairs of species (humans and mice) show signatures consistent with having been affected by linkage to selected sites. Further, our theoretical models and simulations show that natural selection indirectly affecting linked neutral sites can generate these patterns. Unless substantially more of the genome is under the direct effects of purifying selection than currently believed, our results argue that natural selection has played an important role in shaping variation in levels of putatively neutral sequence divergence across the genome. Our findings further suggest that divergence-based estimates of neutral mutation rate variation across the genome as well as certain estimators of population history may be confounded by linkage to selected sites.
Collapse
Affiliation(s)
- Tanya N. Phung
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Christian D. Huber
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
46
|
Abstract
Genome sequencing studies of de novo mutations in humans have revealed surprising incongruities in our understanding of human germline mutation. In particular, the mutation rate observed in modern humans is substantially lower than that estimated from calibration against the fossil record, and the paternal age effect in mutations transmitted to offspring is much weaker than expected from our long-standing model of spermatogenesis. I consider possible explanations for these discrepancies, including evolutionary changes in life-history parameters such as generation time and the age of puberty, a possible contribution from undetected post-zygotic mutations early in embryo development, and changes in cellular mutation processes at different stages of the germline. I suggest a revised model of stem-cell state transitions during spermatogenesis, in which 'dark' gonial stem cells play a more active role than hitherto envisaged, with a long cycle time undetected in experimental observations. More generally, I argue that the mutation rate and its evolution depend intimately on the structure of the germline in humans and other primates.This article is part of the themed issue 'Dating species divergences using rocks and clocks'.
Collapse
Affiliation(s)
- Aylwyn Scally
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| |
Collapse
|
47
|
Wakeley J, King L, Wilton PR. Effects of the population pedigree on genetic signatures of historical demographic events. Proc Natl Acad Sci U S A 2016; 113:7994-8001. [PMID: 27432946 PMCID: PMC4961129 DOI: 10.1073/pnas.1601080113] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Genetic variation among loci in the genomes of diploid biparental organisms is the result of mutation and genetic transmission through the genealogy, or population pedigree, of the species. We explore the consequences of this for patterns of variation at unlinked loci for two kinds of demographic events: the occurrence of a very large family or a strong selective sweep that occurred in the recent past. The results indicate that only rather extreme versions of such events can be expected to structure population pedigrees in such a way that unlinked loci will show deviations from the standard predictions of population genetics, which average over population pedigrees. The results also suggest that large samples of individuals and loci increase the chance of picking up signatures of these events, and that very large families may have a unique signature in terms of sample distributions of mutant alleles.
Collapse
Affiliation(s)
- John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
| | - Léandra King
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
| | - Peter R Wilton
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
| |
Collapse
|
48
|
Staples J, Witherspoon D, Jorde L, Nickerson D, Below J, Huff C, Huff CD. PADRE: Pedigree-Aware Distant-Relationship Estimation. Am J Hum Genet 2016; 99:154-62. [PMID: 27374771 DOI: 10.1016/j.ajhg.2016.05.020] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 05/16/2016] [Indexed: 10/21/2022] Open
Abstract
Accurate estimation of shared ancestry is an important component of many genetic studies; current prediction tools accurately estimate pairwise genetic relationships up to the ninth degree. Pedigree-aware distant-relationship estimation (PADRE) combines relationship likelihoods generated by estimation of recent shared ancestry (ERSA) with likelihoods from family networks reconstructed by pedigree reconstruction and identification of a maximum unrelated set (PRIMUS), improving the power to detect distant relationships between pedigrees. Using PADRE, we estimated relationships from simulated pedigrees and three extended pedigrees, correctly predicting 20% more fourth- through ninth-degree simulated relationships than when using ERSA alone. By leveraging pedigree information, PADRE can even identify genealogical relationships between individuals who are genetically unrelated. For example, although 95% of 13(th)-degree relatives are genetically unrelated, in simulations, PADRE correctly predicted 50% of 13(th)-degree relationships to within one degree of relatedness. The improvement in prediction accuracy was consistent between simulated and actual pedigrees. We also applied PADRE to the HapMap3 CEU samples and report new cryptic relationships and validation of previously described relationships between families. PADRE greatly expands the range of relationships that can be estimated by using genetic data in pedigrees.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Chad D Huff
- Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA.
| |
Collapse
|
49
|
Conflation of Short Identity-by-Descent Segments Bias Their Inferred Length Distribution. G3-GENES GENOMES GENETICS 2016; 6:1287-96. [PMID: 26935417 PMCID: PMC4856080 DOI: 10.1534/g3.116.027581] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to share an IBD segment if that segment is inherited from a recent shared common ancestor without intervening recombination. Segments several cM long can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample, and there are currently efforts to detect shorter segments from sequencing. Here, we study a problem of identifiability: because existing approaches detect IBD based on contiguous segments of identity-by-state, inferred long segments of IBD may arise from the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that significant proportions of inferred segments 1–2 cM long are results of conflations of two or more shorter segments, each at least 0.2 cM or longer, under demographic scenarios typical for modern humans for all programs tested. The impact of such conflation is much smaller for longer (> 2 cM) segments. This biases the inferred IBD segment length distribution, and so can affect downstream inferences that depend on the assumption that each segment of IBD derives from a single common ancestor. As an example, we present and analyze an estimator of the de novo mutation rate using IBD segments, and demonstrate that unmodeled conflation leads to underestimates of the ages of the common ancestors on these segments, and hence a significant overestimate of the mutation rate. Understanding the conflation effect in detail will make its correction in future methods more tractable.
Collapse
|
50
|
Lipson M, Loh PR, Sankararaman S, Patterson N, Berger B, Reich D. Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes. PLoS Genet 2015; 11:e1005550. [PMID: 26562831 PMCID: PMC4642934 DOI: 10.1371/journal.pgen.1005550] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Accepted: 09/03/2015] [Indexed: 01/06/2023] Open
Abstract
The human mutation rate is an essential parameter for studying the evolution of our species, interpreting present-day genetic variation, and understanding the incidence of genetic disease. Nevertheless, our current estimates of the rate are uncertain. Most notably, recent approaches based on counting de novo mutations in family pedigrees have yielded significantly smaller values than classical methods based on sequence divergence. Here, we propose a new method that uses the fine-scale human recombination map to calibrate the rate of accumulation of mutations. By comparing local heterozygosity levels in diploid genomes to the genetic distance scale over which these levels change, we are able to estimate a long-term mutation rate averaged over hundreds or thousands of generations. We infer a rate of 1.61 ± 0.13 × 10-8 mutations per base per generation, which falls in between phylogenetic and pedigree-based estimates, and we suggest possible mechanisms to reconcile our estimate with previous studies. Our results support intermediate-age divergences among human populations and between humans and other great apes.
Collapse
Affiliation(s)
- Mark Lipson
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (ML), (DR)
| | - Po-Ru Loh
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Sriram Sankararaman
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Nick Patterson
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Bonnie Berger
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Mathematics and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (ML), (DR)
| |
Collapse
|