1
|
Schweiger R, Lee S, Zhou C, Yang TP, Smith K, Li S, Sanghvi R, Neville M, Mitchell E, Nessa A, Wadge S, Small KS, Campbell PJ, Sudmant PH, Rahbari R, Durbin R. Insights into non-crossover recombination from long-read sperm sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.05.602249. [PMID: 39005338 PMCID: PMC11245106 DOI: 10.1101/2024.07.05.602249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Meiotic recombination is a fundamental process that generates genetic diversity by creating new combinations of existing alleles. Although human crossovers have been studied at the pedigree, population and single-cell level, the more frequent non-crossover events that lead to gene conversion are harder to study, particularly at the individual level. Here we show that single high-fidelity long sequencing reads from sperm can capture both crossovers and non-crossovers, allowing effectively arbitrary sample sizes for analysis from one male. Using fifteen sperm samples from thirteen donors we demonstrate variation between and within donors for the rates of different types of recombination. Intriguingly, we observe a tendency for non-crossover gene conversions to occur upstream of nearby PRDM9 binding sites, whereas crossover locations have a slight downstream bias. We further provide evidence for two distinct non-crossover processes. One gives rise to the vast majority of non-crossovers with mean conversion tract length under 50bp, which we suggest is an outcome of standard PRDM9-induced meiotic recombination. In contrast ~2% of non-crossovers have much longer mean tract length, and potentially originate from the same process as complex events with more than two haplotype switches, which is not associated with PRDM9 binding sites and is also seen in somatic cells.
Collapse
Affiliation(s)
- Regev Schweiger
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom
| | - Sangjin Lee
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Chenxi Zhou
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom
| | - Tsun-Po Yang
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Katie Smith
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Stacy Li
- Department of Integrative Biology, University of California Berkeley, Berkeley, USA
| | - Rashesh Sanghvi
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Matthew Neville
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Emily Mitchell
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Ayrun Nessa
- Kings College London, Department of Twin Research & Genetic Epidemiology, London, United Kingdom
| | - Sam Wadge
- Kings College London, Department of Twin Research & Genetic Epidemiology, London, United Kingdom
| | - Kerrin S Small
- Kings College London, Department of Twin Research & Genetic Epidemiology, London, United Kingdom
| | - Peter J Campbell
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
- Wellcome-MRC Cambridge Stem Cell Institute, Cambridge Biomedical Campus, Cambridge, UK
| | - Peter H Sudmant
- Department of Integrative Biology, University of California Berkeley, Berkeley, USA
- Center for Computational Biology, University of California Berkeley, Berkeley, USA
| | - Raheleh Rahbari
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom
- Wellcome Sanger Institute, Cancer Ageing and Somatic Mutation, Hinxton, Cambridge CB10 1SA, United Kingdom
| |
Collapse
|
2
|
Nguyen AK, Blacksmith MS, Kidd JM. Duplications and Retrogenes Are Numerous and Widespread in Modern Canine Genomic Assemblies. Genome Biol Evol 2024; 16:evae142. [PMID: 38946312 PMCID: PMC11259980 DOI: 10.1093/gbe/evae142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 05/08/2024] [Accepted: 06/24/2024] [Indexed: 07/02/2024] Open
Abstract
Recent years have seen a dramatic increase in the number of canine genome assemblies available. Duplications are an important source of evolutionary novelty and are also prone to misassembly. We explored the duplication content of nine canine genome assemblies using both genome self-alignment and read-depth approaches. We find that 8.58% of the genome is duplicated in the canFam4 assembly, derived from the German Shepherd Dog Mischka, including 90.15% of unplaced contigs. Highlighting the continued difficulty in properly assembling duplications, less than half of read-depth and assembly alignment duplications overlap, but the mCanLor1.2 Greenland wolf assembly shows greater concordance. Further study shows the presence of multiple segments that have alignments to four or more duplicate copies. These high-recurrence duplications correspond to gene retrocopies. We identified 3,892 candidate retrocopies from 1,316 parental genes in the canFam4 assembly and find that ∼8.82% of duplicated base pairs involve a retrocopy, confirming this mechanism as a major driver of gene duplication in canines. Similar patterns are found across eight other recent canine genome assemblies, with metrics supporting a greater quality of the PacBio HiFi mCanLor1.2 assembly. Comparison between the wolf and other canine assemblies found that 92% of retrocopy insertions are shared between assemblies. By calculating the number of generations since genome divergence, we estimate that new retrocopy insertions appear, on average, in 1 out of 3,514 births. Our analyses illustrate the impact of retrogene formation on canine genomes and highlight the variable representation of duplicated sequences among recently completed canine assemblies.
Collapse
Affiliation(s)
- Anthony K Nguyen
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Matthew S Blacksmith
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
3
|
Yan Z, Ge F, Liu Y, Zhang Y, Li F, Song J, Yu DJ. TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion. J Chem Inf Model 2024; 64:1407-1418. [PMID: 38334115 DOI: 10.1021/acs.jcim.3c02019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
Studying the effect of single amino acid variations (SAVs) on protein structure and function is integral to advancing our understanding of molecular processes, evolutionary biology, and disease mechanisms. Screening for deleterious variants is one of the crucial issues in precision medicine. Here, we propose a novel computational approach, TransEFVP, based on large-scale protein language model embeddings and a transformer-based neural network to predict disease-associated SAVs. The model adopts a two-stage architecture: the first stage is designed to fuse different feature embeddings through a transformer encoder. In the second stage, a support vector machine model is employed to quantify the pathogenicity of SAVs after dimensionality reduction. The prediction performance of TransEFVP on blind test data achieves a Matthews correlation coefficient of 0.751, an F1-score of 0.846, and an area under the receiver operating characteristic curve of 0.871, higher than the existing state-of-the-art methods. The benchmark results demonstrate that TransEFVP can be explored as an accurate and effective SAV pathogenicity prediction method. The data and codes for TransEFVP are available at https://github.com/yzh9607/TransEFVP/tree/master for academic use.
Collapse
Affiliation(s)
- Zihao Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Fang Ge
- State Key Laboratory of Organic Electronics and lnformation Displays & lnstitute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, PR China
| | - Yan Liu
- Department of Computer Science, Yangzhou University, Yangzhou 225100, PR China
| | - Yumeng Zhang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, PR China
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Fuyi Li
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria 3000, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| |
Collapse
|
4
|
Beichman AC, Robinson J, Lin M, Moreno-Estrada A, Nigenda-Morales S, Harris K. Evolution of the Mutation Spectrum Across a Mammalian Phylogeny. Mol Biol Evol 2023; 40:msad213. [PMID: 37770035 PMCID: PMC10566577 DOI: 10.1093/molbev/msad213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/21/2023] [Accepted: 09/19/2023] [Indexed: 10/03/2023] Open
Abstract
Although evolutionary biologists have long theorized that variation in DNA repair efficacy might explain some of the diversity of lifespan and cancer incidence across species, we have little data on the variability of normal germline mutagenesis outside of humans. Here, we shed light on the spectrum and etiology of mutagenesis across mammals by quantifying mutational sequence context biases using polymorphism data from thirteen species of mice, apes, bears, wolves, and cetaceans. After normalizing the mutation spectrum for reference genome accessibility and k-mer content, we use the Mantel test to deduce that mutation spectrum divergence is highly correlated with genetic divergence between species, whereas life history traits like reproductive age are weaker predictors of mutation spectrum divergence. Potential bioinformatic confounders are only weakly related to a small set of mutation spectrum features. We find that clock-like mutational signatures previously inferred from human cancers cannot explain the phylogenetic signal exhibited by the mammalian mutation spectrum, despite the ability of these signatures to fit each species' 3-mer spectrum with high cosine similarity. In contrast, parental aging signatures inferred from human de novo mutation data appear to explain much of the 1-mer spectrum's phylogenetic signal in combination with a novel mutational signature. We posit that future models purporting to explain the etiology of mammalian mutagenesis need to capture the fact that more closely related species have more similar mutation spectra; a model that fits each marginal spectrum with high cosine similarity is not guaranteed to capture this hierarchy of mutation spectrum variation among species.
Collapse
Affiliation(s)
- Annabel C Beichman
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jacqueline Robinson
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Meixi Lin
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity, Advanced Genomics Unit (UGA-LANGEBIO), CINVESTAV, Irapuato, Mexico
| | - Sergio Nigenda-Morales
- Department of Biological Sciences, California State University, San Marcos, San Marcos, CA, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Herbold Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, USA
| |
Collapse
|
5
|
Lee YL, Bouwman AC, Harland C, Bosse M, Costa Monteiro Moreira G, Veerkamp RF, Mullaart E, Cambisano N, Groenen MAM, Karim L, Coppieters W, Georges M, Charlier C. The rate of de novo structural variation is increased in in vitro-produced offspring and preferentially affects the paternal genome. Genome Res 2023; 33:1455-1464. [PMID: 37793781 PMCID: PMC10620045 DOI: 10.1101/gr.277884.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 08/08/2023] [Indexed: 10/06/2023]
Abstract
Assisted reproductive technologies (ARTs), including in vitro maturation and fertilization (IVF), are increasingly used in human and animal reproduction. Whether these technologies directly affect the rate of de novo mutation (DNM), and to what extent, has been a matter of debate. Here we take advantage of domestic cattle, characterized by complex pedigrees that are ideally suited to detect DNMs and by the systematic use of ART, to study the rate of de novo structural variation (dnSV) in this species and how it is impacted by IVF. By exploiting features of associated de novo point mutations (dnPMs) and dnSVs in clustered DNMs, we provide strong evidence that (1) IVF increases the rate of dnSV approximately fivefold, and (2) the corresponding mutations occur during the very early stages of embryonic development (one- and two-cell stage), yet primarily affect the paternal genome.
Collapse
Affiliation(s)
- Young-Lim Lee
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Aniek C Bouwman
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Chad Harland
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium
- Livestock Improvement Corporation, Hamilton 3240, New Zealand
| | - Mirte Bosse
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | | | - Roel F Veerkamp
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | | | - Nadine Cambisano
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Martien A M Groenen
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Latifa Karim
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Wouter Coppieters
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Michel Georges
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
| | - Carole Charlier
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
| |
Collapse
|
6
|
Beichman AC, Robinson J, Lin M, Moreno-Estrada A, Nigenda-Morales S, Harris K. "Evolution of the mutation spectrum across a mammalian phylogeny". BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.31.543114. [PMID: 37398383 PMCID: PMC10312511 DOI: 10.1101/2023.05.31.543114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Little is known about how the spectrum and etiology of germline mutagenesis might vary among mammalian species. To shed light on this mystery, we quantify variation in mutational sequence context biases using polymorphism data from thirteen species of mice, apes, bears, wolves, and cetaceans. After normalizing the mutation spectrum for reference genome accessibility and k -mer content, we use the Mantel test to deduce that mutation spectrum divergence is highly correlated with genetic divergence between species, whereas life history traits like reproductive age are weaker predictors of mutation spectrum divergence. Potential bioinformatic confounders are only weakly related to a small set of mutation spectrum features. We find that clocklike mutational signatures previously inferred from human cancers cannot explain the phylogenetic signal exhibited by the mammalian mutation spectrum, despite the ability of these clocklike signatures to fit each species' 3-mer spectrum with high cosine similarity. In contrast, parental aging signatures inferred from human de novo mutation data appear to explain much of the mutation spectrum's phylogenetic signal when fit to non-context-dependent mutation spectrum data in combination with a novel mutational signature. We posit that future models purporting to explain the etiology of mammalian mutagenesis need to capture the fact that more closely related species have more similar mutation spectra; a model that fits each marginal spectrum with high cosine similarity is not guaranteed to capture this hierarchy of mutation spectrum variation among species.
Collapse
Affiliation(s)
| | - Jacqueline Robinson
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA
| | - Meixi Lin
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity, Advanced Genomics Unit (UGA-LANGEBIO), CINVESTAV, Irapuato, Mexico
| | - Sergio Nigenda-Morales
- Department of Biological Sciences, California State University, San Marcos, San Marcos CA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle WA
| |
Collapse
|
7
|
Vollger MR, Dishuck PC, Harvey WT, DeWitt WS, Guitart X, Goldberg ME, Rozanski AN, Lucas J, Asri M, Munson KM, Lewis AP, Hoekzema K, Logsdon GA, Porubsky D, Paten B, Harris K, Hsieh P, Eichler EE. Increased mutation and gene conversion within human segmental duplications. Nature 2023; 617:325-334. [PMID: 37165237 PMCID: PMC10172114 DOI: 10.1038/s41586-023-05895-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.
Collapse
Affiliation(s)
- Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William S DeWitt
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Michael E Goldberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
8
|
Komluski J, Habig M, Stukenbrock EH. Repeat-Induced Point Mutation and Gene Conversion Coinciding with Heterochromatin Shape the Genome of a Plant-Pathogenic Fungus. mBio 2023:e0329022. [PMID: 37093087 DOI: 10.1128/mbio.03290-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2023] Open
Abstract
Meiosis is associated with genetic changes in the genome-via recombination, gene conversion, and mutations. The occurrence of gene conversion and mutations during meiosis may further be influenced by the chromatin conformation, similar to the effect of the chromatin conformation on the mitotic mutation rate. To date, however, the exact distribution and type of meiosis-associated changes and the role of the chromatin conformation in this context are largely unexplored. Here, we determine recombination, gene conversion, and de novo mutations using whole-genome sequencing of all meiotic products of 23 individual meioses in Zymoseptoria tritici, an important pathogen of wheat. We confirm a high genome-wide recombination rate of 65 centimorgan (cM)/Mb and see higher recombination rates on the accessory compared to core chromosomes. A substantial fraction of 0.16% of all polymorphic markers was affected by gene conversions, showing a weak GC-bias and occurring at higher frequency in regions of constitutive heterochromatin, indicated by the histone modification H3K9me3. The de novo mutation rate associated with meiosis was approximately three orders of magnitude higher than the corresponding mitotic mutation rate. Importantly, repeat-induced point mutation (RIP), a fungal defense mechanism against duplicated sequences, is active in Z. tritici and responsible for the majority of these de novo meiotic mutations. Our results indicate that the genetic changes associated with meiosis are a major source of variability in the genome of an important plant pathogen and shape its evolutionary trajectory. IMPORTANCE The impact of meiosis on the genome composition via gene conversion and mutations is mostly poorly understood, in particular, for non-model species. Here, we sequenced all four meiotic products for 23 individual meioses and determined the genetic changes caused by meiosis for the important fungal wheat pathogen Zymoseptoria tritici. We found a high rate of gene conversions and an effect of the chromatin conformation on gene conversion rates. Higher conversion rates were found in regions enriched with the H3K9me3-a mark for constitutive heterochromatin. Most importantly, meiosis was associated with a much higher frequency of de novo mutations than mitosis; 78% of the meiotic mutations were caused by repeat-induced point mutations-a fungal defense mechanism against duplicated sequences. In conclusion, the genetic changes associated with meiosis are therefore a major factor shaping the genome of this fungal pathogen.
Collapse
Affiliation(s)
- Jovan Komluski
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Michael Habig
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Eva H Stukenbrock
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
9
|
Ghafoor S, Santos J, Versoza CJ, Jensen JD, Pfeifer SP. The Impact of Sample Size and Population History on Observed Mutational Spectra: A Case Study in Human and Chimpanzee Populations. Genome Biol Evol 2023; 15:7039701. [PMID: 36790107 PMCID: PMC9989333 DOI: 10.1093/gbe/evad019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 01/20/2023] [Accepted: 02/08/2023] [Indexed: 02/16/2023] Open
Abstract
Recent studies have highlighted variation in the mutational spectra among human populations as well as closely related hominoids-yet little remains known about the genetic and nongenetic factors driving these rate changes across the genome. Pinpointing the root causes of these differences is an important endeavor that requires careful comparative analyses of population-specific mutational landscapes at both broad and fine genomic scales. However, several factors can confound such analyses. Although previous studies have shown that technical artifacts, such as sequencing errors and batch effects, can contribute to observed mutational shifts, other potentially confounding parameters have received less attention thus far. Using population genetic simulations of human and chimpanzee populations as an illustrative example, we here show that the sample size required for robust inference of mutational spectra depends on the population-specific demographic history. As a consequence, the power to detect rate changes is high in certain hominoid populations while, for others, currently available sample sizes preclude analyses at fine genomic scales.
Collapse
Affiliation(s)
- Suhail Ghafoor
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - João Santos
- School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Cyril J Versoza
- School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Susanne P Pfeifer
- School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
10
|
Gao Z, Zhang Y, Cramer N, Przeworski M, Moorjani P. Limited role of generation time changes in driving the evolution of the mutation spectrum in humans. eLife 2023; 12:e81188. [PMID: 36779395 PMCID: PMC10014080 DOI: 10.7554/elife.81188] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 02/02/2023] [Indexed: 02/14/2023] Open
Abstract
Recent studies have suggested that the human germline mutation rate and spectrum evolve rapidly. Variation in generation time has been linked to these changes, though its contribution remains unclear. We develop a framework to characterize temporal changes in polymorphisms within and between populations, while controlling for the effects of natural selection and biased gene conversion. Application to the 1000 Genomes Project dataset reveals multiple independent changes that arose after the split of continental groups, including a previously reported, transient elevation in TCC>TTC mutations in Europeans and novel signals of divergence in C>Gand T>A mutation rates among population samples. We also find a significant difference between groups sampled in and outside of Africa in old T>C polymorphisms that predate the out-of-Africa migration. This surprising signal is driven by TpG>CpG mutations and stems in part from mis-polarized CpG transitions, which are more likely to undergo recurrent mutations. Finally, by relating the mutation spectrum of polymorphisms to parental age effects on de novo mutations, we show that plausible changes in the generation time cannot explain the patterns observed for different mutation types jointly. Thus, other factors - genetic modifiers or environmental exposures - must have had a non-negligible impact on the human mutation landscape.
Collapse
Affiliation(s)
- Ziyue Gao
- Department of Genetics, University of Pennsylvania, Perelman School of MedicinePhiladelphiaUnited States
| | - Yulin Zhang
- Center for Computational Biology, University of California, BerkeleyBerkeleyUnited States
| | - Nathan Cramer
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
| | - Molly Przeworski
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
- Department of Systems Biology, Columbia UniversityNew YorkUnited States
| | - Priya Moorjani
- Center for Computational Biology, University of California, BerkeleyBerkeleyUnited States
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
| |
Collapse
|
11
|
Souilmi Y, Tobler R, Johar A, Williams M, Grey ST, Schmidt J, Teixeira JC, Rohrlach A, Tuke J, Johnson O, Gower G, Turney C, Cox M, Cooper A, Huber CD. Admixture has obscured signals of historical hard sweeps in humans. Nat Ecol Evol 2022; 6:2003-2015. [PMID: 36316412 PMCID: PMC9715430 DOI: 10.1038/s41559-022-01914-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Accepted: 09/16/2022] [Indexed: 11/06/2022]
Abstract
The role of natural selection in shaping biological diversity is an area of intense interest in modern biology. To date, studies of positive selection have primarily relied on genomic datasets from contemporary populations, which are susceptible to confounding factors associated with complex and often unknown aspects of population history. In particular, admixture between diverged populations can distort or hide prior selection events in modern genomes, though this process is not explicitly accounted for in most selection studies despite its apparent ubiquity in humans and other species. Through analyses of ancient and modern human genomes, we show that previously reported Holocene-era admixture has masked more than 50 historic hard sweeps in modern European genomes. Our results imply that this canonical mode of selection has probably been underappreciated in the evolutionary history of humans and suggest that our current understanding of the tempo and mode of selection in natural populations may be inaccurate.
Collapse
Affiliation(s)
- Yassine Souilmi
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
| | - Raymond Tobler
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
- Evolution of Cultural Diversity Initiative, Australian National University, Canberra, Australian Capital Territory, Australia.
| | - Angad Johar
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA.
| | - Matthew Williams
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Shane T Grey
- Transplantation Immunology Group, Immunology Division, Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia
- St Vincent's Clinical School, Faculty of Medicine, UNSW, Darlinghurst, New South Wales, Australia
| | - Joshua Schmidt
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - João C Teixeira
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Adam Rohrlach
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide, Adelaide, South Australia, Australia
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Jonathan Tuke
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide, Adelaide, South Australia, Australia
- School of Mathematical Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Olivia Johnson
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Graham Gower
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Chris Turney
- Chronos 14Carbon-Cycle Facility and Earth and Sustainability Science Research Centre, University of New South Wales, Sydney, New South Wales, Australia
| | - Murray Cox
- Statistics and Bioinformatics Group, School of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Alan Cooper
- South Australian Museum, Adelaide, South Australia, Australia.
- BlueSky Genetics, Ashton, South Australia, Australia.
| | - Christian D Huber
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
- Department of Biology, Penn State University, University Park, PA, USA.
| |
Collapse
|
12
|
Estimating the genome-wide mutation rate from thousands of unrelated individuals. Am J Hum Genet 2022; 109:2178-2184. [PMID: 36370709 PMCID: PMC9748258 DOI: 10.1016/j.ajhg.2022.10.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 10/15/2022] [Indexed: 11/13/2022] Open
Abstract
We provide a method for estimating the genome-wide mutation rate from sequence data on unrelated individuals by using segments of identity by descent (IBD). The length of an IBD segment indicates the time to shared ancestor of the segment, and mutations that have occurred since the shared ancestor result in discordances between the two IBD haplotypes. Previous methods for IBD-based estimation of mutation rate have required the use of family data for accurate phasing of the genotypes. This has limited the scope of application of IBD-based mutation rate estimation. Here, we develop an IBD-based method for mutation rate estimation from population data, and we apply it to whole-genome sequence data on 4,166 European American individuals from the TOPMed Framingham Heart Study, 2,996 European American individuals from the TOPMed My Life, Our Future study, and 1,586 African American individuals from the TOPMed Hypertension Genetic Epidemiology Network study. Although mutation rates may differ between populations as a result of genetic factors, demographic factors such as average parental age, and environmental exposures, our results are consistent with equal genome-wide average mutation rates across these three populations. Our overall estimate of the average genome-wide mutation rate per 108 base pairs per generation for single-nucleotide variants is 1.24 (95% CI 1.18-1.33).
Collapse
|
13
|
Schlichta F, Moinet A, Peischl S, Excoffier L. The Impact of Genetic Surfing on Neutral Genomic Diversity. Mol Biol Evol 2022; 39:msac249. [PMID: 36403964 PMCID: PMC9703594 DOI: 10.1093/molbev/msac249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Range expansions have been common in the history of most species. Serial founder effects and subsequent population growth at expansion fronts typically lead to a loss of genomic diversity along the expansion axis. A frequent consequence is the phenomenon of "gene surfing," where variants located near the expanding front can reach high frequencies or even fix in newly colonized territories. Although gene surfing events have been characterized thoroughly for a specific locus, their effects on linked genomic regions and the overall patterns of genomic diversity have been little investigated. In this study, we simulated the evolution of whole genomes during several types of 1D and 2D range expansions differing by the extent of migration, founder events, and recombination rates. We focused on the characterization of local dips of diversity, or "troughs," taken as a proxy for surfing events. We find that, for a given recombination rate, once we consider the amount of diversity lost since the beginning of the expansion, it is possible to predict the initial evolution of trough density and their average width irrespective of the expansion condition. Furthermore, when recombination rates vary across the genome, we find that troughs are over-represented in regions of low recombination. Therefore, range expansions can leave local and global genomic signatures often interpreted as evidence of past selective events. Given the generality of our results, they could be used as a null model for species having gone through recent expansions, and thus be helpful to correctly interpret many evolutionary biology studies.
Collapse
Affiliation(s)
- Flávia Schlichta
- Computational and Molecular Population Genetics lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Antoine Moinet
- Computational and Molecular Population Genetics lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Interfaculty Bioinformatics Unit, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Stephan Peischl
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Interfaculty Bioinformatics Unit, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Laurent Excoffier
- Computational and Molecular Population Genetics lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
14
|
Melka AB, Louzoun Y. High fraction of silent recombination in a finite-population two-locus neutral birth-death-mutation model. Phys Rev E 2022; 106:024409. [PMID: 36109958 DOI: 10.1103/physreve.106.024409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Accepted: 07/25/2022] [Indexed: 06/15/2023]
Abstract
A precise estimate of allele and haplotype polymorphism is of great interest in theoretical population genetics, but also has practical applications, such as bone marrow registries management. Allele polymorphism is driven mainly by point mutations, while haplotype polymorphism is also affected by recombination. Current estimates treat recombination as mutations in an infinite site model. We here show that even in the simple case of two loci in a haploid individual, for a finite population, most recombination events produce existing haplotypes, and as such are silent. Silent recombination considerably reduces the total number of haplotypes expected from the infinite site model for populations that are not much larger than one over the mutation rate. Moreover, in contrast with mutations, the number of haplotypes does not grow linearly with the population size. We hence propose a more accurate estimate of the total number of haplotypes that takes into account silent recombination. We study large-scale human leukocyte antigen (HLA) haplotype frequencies from human populations to show that the current estimated recombination rate in the HLA region is underestimated.
Collapse
Affiliation(s)
- A B Melka
- Department of Mathematics, Bar-Ilan University, Ramat Gan 52900, Israel
| | - Y Louzoun
- Department of Mathematics, Bar-Ilan University, Ramat Gan 52900, Israel
- Gonda Brain Research Center, Bar-Ilan University, Ramat Gan 52900, Israel
| |
Collapse
|
15
|
Rashed WM, Marcotte EL, Spector LG. Germline De Novo Mutations as a Cause of Childhood Cancer. JCO Precis Oncol 2022; 6:e2100505. [PMID: 35820085 DOI: 10.1200/po.21.00505] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Germline de novo mutations (DNMs) represent one of the important topics that need extensive attention from epidemiologists, geneticists, and other relevant stakeholders. Advances in next-generation sequencing technologies allowed examination of parent-offspring trios to ascertain the frequency of germline DNMs. Many epidemiological risk factors for childhood cancer are indicative of DNMs as a mechanism. The aim of this review was to give an overview of germline DNMs, their causes in general, and to discuss their relation to childhood cancer risk. In addition, we highlighted existing gaps in knowledge in many topics of germline DNMs in childhood cancer that need exploration and collaborative efforts.
Collapse
Affiliation(s)
- Wafaa M Rashed
- Research Department, Children's Cancer Hospital-Egypt 57357 (CCHE-57357), Cairo, Egypt
| | - Erin L Marcotte
- Division of Epidemiology/Clinical, Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN.,Masonic Cancer Center, University of Minnesota, Minneapolis, MN
| | - Logan G Spector
- Division of Epidemiology/Clinical, Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN.,Masonic Cancer Center, University of Minnesota, Minneapolis, MN
| |
Collapse
|
16
|
Wall JD, Robinson JA, Cox LA. High-Resolution Estimates of Crossover and Noncrossover Recombination from a Captive Baboon Colony. Genome Biol Evol 2022; 14:evac040. [PMID: 35325119 PMCID: PMC9048888 DOI: 10.1093/gbe/evac040] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2022] [Indexed: 11/17/2022] Open
Abstract
Homologous recombination has been extensively studied in humans and a handful of model organisms. Much less is known about recombination in other species, including nonhuman primates. Here, we present a study of crossovers (COs) and noncrossover (NCO) recombination in olive baboons (Papio anubis) from two pedigrees containing a total of 20 paternal and 17 maternal meioses, and compare these results to linkage disequilibrium (LD) based recombination estimates from 36 unrelated olive baboons. We demonstrate how COs, combined with LD-based recombination estimates, can be used to identify genome assembly errors. We also quantify sex-specific differences in recombination rates, including elevated male CO and reduced female CO rates near telomeres. Finally, we add to the increasing body of evidence suggesting that while most NCO recombination tracts in mammals are short (e.g., <500 bp), there is a non-negligible fraction of longer (e.g., >1 kb) NCO tracts. For NCO tracts shorter than 10 kb, we fit a mixture of two (truncated) geometric distributions model to the NCO tract length distribution and estimate that >99% of all NCO tracts are very short (mean 24 bp), but the remaining tracts can be quite long (mean 4.3 kb). A single geometric distribution model for NCO tract lengths is incompatible with the data, suggesting that LD-based methods for estimating NCO recombination rates that make this assumption may need to be modified.
Collapse
Affiliation(s)
- Jeffrey D. Wall
- Institute for Human Genetics, University of California San Francisco, USA
| | | | - Laura A. Cox
- Center for Precision Medicine, Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, USA
| |
Collapse
|
17
|
Mularo AJ, Bernal XE, DeWoody JA. Dominance can increase genetic variance after a population bottleneck: a synthesis of the theoretical and empirical evidence. J Hered 2022; 113:257-271. [PMID: 35143665 DOI: 10.1093/jhered/esac007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 02/07/2022] [Indexed: 11/13/2022] Open
Abstract
Drastic reductions in population size, or population bottlenecks, can lead to a reduction in additive genetic variance and adaptive potential. Genetic variance for some quantitative genetic traits, however, can increase after a population reduction. Empirical evaluations of quantitative traits following experimental bottlenecks indicate that non-additive genetic effects, including both allelic dominance at a given locus and epistatic interactions among loci, may impact the additive variance contributed by alleles that ultimately influences phenotypic expression and fitness. The dramatic effects of bottlenecks on overall genetic diversity have been well studied, but relatively little is known about how dominance and demographic events like bottlenecks can impact additive genetic variance. Herein, we critically examine how the degree of dominance among alleles affects additive genetic variance after a bottleneck. We first review and synthesize studies that document the impact of empirical bottlenecks on dominance variance. We then extend earlier work by elaborating on two theoretical models that illustrate the relationship between dominance and the potential increase in additive genetic variance immediately following a bottleneck. Furthermore, we investigate the parameters that influence the maximum level of genetic variation (associated with adaptive potential) after a bottleneck, including the number of founding individuals. Finally, we validated our methods using forward-time population genetic simulations of loci with varying dominance and selection levels. The fate of non-additive genetic variation following bottlenecks could have important implications for conservation and management efforts in a wide variety of taxa, and our work should help contextualize future studies (e.g., epistatic variance) in population genomics.
Collapse
Affiliation(s)
- Andrew J Mularo
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Ximena E Bernal
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA.,Smithsonian Tropical Research Institute, Balboa, Republic of Panamá
| | - J Andrew DeWoody
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA.,Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN
| |
Collapse
|
18
|
Li W, Almirantis Y, Provata A. Revisiting the neutral dynamics derived limiting guanine-cytosine content using human de novo point mutation data. Meta Gene 2022. [DOI: 10.1016/j.mgene.2021.100994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
19
|
Rashid I, Campos M, Collier T, Crepeau M, Weakley A, Gripkey H, Lee Y, Schmidt H, Lanzaro GC. Spontaneous mutation rate estimates for the principal malaria vectors Anopheles coluzzii and Anopheles stephensi. Sci Rep 2022; 12:226. [PMID: 34996998 PMCID: PMC8742016 DOI: 10.1038/s41598-021-03943-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 12/07/2021] [Indexed: 11/17/2022] Open
Abstract
Using high-depth whole genome sequencing of F0 mating pairs and multiple individual F1 offspring, we estimated the nuclear mutation rate per generation in the malaria vectors Anopheles coluzzii and Anopheles stephensi by detecting de novo genetic mutations. A purpose-built computer program was employed to filter actual mutations from a deep background of superficially similar artifacts resulting from read misalignment. Performance of filtering parameters was determined using software-simulated mutations, and the resulting estimate of false negative rate was used to correct final mutation rate estimates. Spontaneous mutation rates by base substitution were estimated at 1.00 × 10−9 (95% confidence interval, 2.06 × 10−10—2.91 × 10−9) and 1.36 × 10−9 (95% confidence interval, 4.42 × 10−10—3.18 × 10−9) per site per generation in A. coluzzii and A. stephensi respectively. Although similar studies have been performed on other insect species including dipterans, this is the first study to empirically measure mutation rates in the important genus Anopheles, and thus provides an estimate of µ that will be of utility for comparative evolutionary genomics, as well as for population genetic analysis of malaria vector mosquito species.
Collapse
Affiliation(s)
- Iliyas Rashid
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA.,Section of Cell and Developmental Biology, University of California, San Diego, La Jolla, CA, USA.,Tata Institute for Genetics and Society, Center at inStem, Bangalore, Karnataka, 560065, India
| | - Melina Campos
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Travis Collier
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Marc Crepeau
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Allison Weakley
- Department of ChEM-H Operations, Stanford University, 450 Serra Mall, Stanford, CA, 94305, USA
| | - Hans Gripkey
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Yoosook Lee
- Florida Medical Entomology Laboratory, University of Florida, 200 9th St SE, Vero Beach, FL, 32962, USA
| | - Hanno Schmidt
- Anthropology, Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University of Mainz, Saarstraße 21, 55122, Mainz, Germany
| | - Gregory C Lanzaro
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA.
| |
Collapse
|
20
|
Arciero E, Dogra SA, Malawsky DS, Mezzavilla M, Tsismentzoglou T, Huang QQ, Hunt KA, Mason D, Sharif SM, van Heel DA, Sheridan E, Wright J, Small N, Carmi S, Iles MM, Martin HC. Fine-scale population structure and demographic history of British Pakistanis. Nat Commun 2021; 12:7189. [PMID: 34893604 PMCID: PMC8664933 DOI: 10.1038/s41467-021-27394-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 11/09/2021] [Indexed: 02/08/2023] Open
Abstract
Previous genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genotype chip data from 2,200 British Pakistanis. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low recent effective population sizes (Ne), with some showing a decrease 15‒20 generations ago that has resulted in extensive identity-by-descent sharing and homozygosity, increasing the risk of recessive disorders. Our results from two orthogonal methods (one using machine learning and the other coalescent-based) suggest that the detailed reporting of parental relatedness for mothers in the cohort under-represents the true levels of consanguinity. These results demonstrate the impact of cultural practices on population structure and genomic diversity in Pakistanis, and have important implications for medical genetic studies.
Collapse
Affiliation(s)
- Elena Arciero
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| | - Sufyan A. Dogra
- grid.418449.40000 0004 0379 5398Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Daniel S. Malawsky
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Massimo Mezzavilla
- grid.5133.40000 0001 1941 4308Department of Medical Sciences, University of Trieste, Trieste, Italy
| | - Theofanis Tsismentzoglou
- grid.9909.90000 0004 1936 8403Leeds Institute for Data Analytics, University of Leeds, Leeds, UK ,grid.9909.90000 0004 1936 8403Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Qin Qin Huang
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Karen A. Hunt
- grid.4868.20000 0001 2171 1133Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Dan Mason
- grid.418449.40000 0004 0379 5398Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Saghira Malik Sharif
- grid.415967.80000 0000 9965 1030Yorkshire Regional Genetics Service, Leeds Teaching Hospitals NHS Trust, Leeds, UK
| | - David A. van Heel
- grid.4868.20000 0001 2171 1133Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Eamonn Sheridan
- grid.9909.90000 0004 1936 8403Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - John Wright
- grid.418449.40000 0004 0379 5398Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Neil Small
- grid.6268.a0000 0004 0379 5283Faculty of Health Studies, University of Bradford, Richmond Road, Bradford, UK
| | - Shai Carmi
- grid.9619.70000 0004 1937 0538Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Mark M. Iles
- grid.9909.90000 0004 1936 8403Leeds Institute for Data Analytics, University of Leeds, Leeds, UK ,grid.9909.90000 0004 1936 8403Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Hilary C. Martin
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
21
|
Schaefer NK, Shapiro B, Green RE. An ancestral recombination graph of human, Neanderthal, and Denisovan genomes. SCIENCE ADVANCES 2021; 7:eabc0776. [PMID: 34272242 PMCID: PMC8284891 DOI: 10.1126/sciadv.abc0776] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 06/03/2021] [Indexed: 05/02/2023]
Abstract
Many humans carry genes from Neanderthals, a legacy of past admixture. Existing methods detect this archaic hominin ancestry within human genomes using patterns of linkage disequilibrium or direct comparison to Neanderthal genomes. Each of these methods is limited in sensitivity and scalability. We describe a new ancestral recombination graph inference algorithm that scales to large genome-wide datasets and demonstrate its accuracy on real and simulated data. We then generate a genome-wide ancestral recombination graph including human and archaic hominin genomes. From this, we generate a map within human genomes of archaic ancestry and of genomic regions not shared with archaic hominins either by admixture or incomplete lineage sorting. We find that only 1.5 to 7% of the modern human genome is uniquely human. We also find evidence of multiple bursts of adaptive changes specific to modern humans within the past 600,000 years involving genes related to brain development and function.
Collapse
Affiliation(s)
- Nathan K Schaefer
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Beth Shapiro
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard E Green
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA.
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
22
|
SeyedAlinaghi S, Mirzapour P, Dadras O, Pashaei Z, Karimi A, MohsseniPour M, Soleymanzadeh M, Barzegary A, Afsahi AM, Vahedi F, Shamsabadi A, Behnezhad F, Saeidi S, Mehraeen E, Shayesteh Jahanfar. Characterization of SARS-CoV-2 different variants and related morbidity and mortality: a systematic review. Eur J Med Res 2021; 26:51. [PMID: 34103090 PMCID: PMC8185313 DOI: 10.1186/s40001-021-00524-8] [Citation(s) in RCA: 69] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/28/2021] [Indexed: 02/08/2023] Open
Abstract
INTRODUCTION Coronavirus Disease-2019 (SARS-CoV-2) started its devastating trajectory into a global pandemic in Wuhan, China, in December 2019. Ever since, several variants of SARS-CoV-2 have been identified. In the present review, we aimed to characterize the different variants of SARS-CoV-2 and explore the related morbidity and mortality. METHODS A systematic review including the current evidence related to different variants of SARS-CoV-2 and the related morbidity and mortality was conducted through a systematic search utilizing the keywords in the online databases including Scopus, PubMed, Web of Science, and Science Direct; we retrieved all related papers and reports published in English from December 2019 to September 2020. RESULTS A review of identified articles has shown three main genomic variants, including type A, type B, and type C. we also identified three clades including S, V, and G. Studies have demonstrated that the C14408T and A23403G alterations in the Nsp12 and S proteins are the most prominent alterations in the world, leading to life-threatening mutations.The spike D614G amino acid change has become the most common variant since December 2019. From missense mutations found from Gujarat SARS-CoV-2 genomes, C28854T, deleterious mutation in the nucleocapsid (N) gene was significantly associated with patients' mortality. The other significant deleterious variant (G25563T) is found in patients located in Orf3a and has a potential role in viral pathogenesis. CONCLUSION Overall, researchers identified several SARS-CoV-2 variants changing clinical manifestations and increasing the transmissibility, morbidity, and mortality of COVID-19. This should be considered in current practice and interventions to combat the pandemic and prevent related morbidity and mortality.
Collapse
Affiliation(s)
- SeyedAhmad SeyedAlinaghi
- Iranian Research Center for HIV/AIDS, Iranian Institute for Reduction of High Risk Behaviors, Tehran University of Medical Sciences, Tehran, Iran
| | - Pegah Mirzapour
- Iranian Research Center for HIV/AIDS, Iranian Institute for Reduction of High Risk Behaviors, Tehran University of Medical Sciences, Tehran, Iran
| | - Omid Dadras
- Department of Global Health and Socioepidemiology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Zahra Pashaei
- Chronic Respiratory Disease Research Center, Masih Daneshvari Hospital, Tehran, Iran
| | - Amirali Karimi
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Mehrzad MohsseniPour
- Iranian Research Center for HIV/AIDS, Iranian Institute for Reduction of High Risk Behaviors, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahdi Soleymanzadeh
- Ophthalmology Resident at Farabi Hospital, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | | | - Amir Masoud Afsahi
- Department of Radiology, School of Medicine, University of California, San Diego, CA, USA
| | - Farzin Vahedi
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Ahmadreza Shamsabadi
- Department of Health Information Technology, Esfarayen Faculty of Medical Sciences, Esfarayen, Iran
| | - Farzane Behnezhad
- Department of Virology, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
| | - Solmaz Saeidi
- Department of Nursing, Khalkhal University of Medical Sciences, Khalkhal, Iran
| | - Esmaeil Mehraeen
- Department of Health Information Technology, Khalkhal University of Medical Sciences, 1419733141, Khalkhal, Iran.
| | - Shayesteh Jahanfar
- Department of Public Health and Community Medicine, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
23
|
Bergero R, Ellis P, Haerty W, Larcombe L, Macaulay I, Mehta T, Mogensen M, Murray D, Nash W, Neale MJ, O'Connor R, Ottolini C, Peel N, Ramsey L, Skinner B, Suh A, Summers M, Sun Y, Tidy A, Rahbari R, Rathje C, Immler S. Meiosis and beyond - understanding the mechanistic and evolutionary processes shaping the germline genome. Biol Rev Camb Philos Soc 2021; 96:822-841. [PMID: 33615674 PMCID: PMC8246768 DOI: 10.1111/brv.12680] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 12/15/2020] [Accepted: 12/15/2020] [Indexed: 12/11/2022]
Abstract
The separation of germ cell populations from the soma is part of the evolutionary transition to multicellularity. Only genetic information present in the germ cells will be inherited by future generations, and any molecular processes affecting the germline genome are therefore likely to be passed on. Despite its prevalence across taxonomic kingdoms, we are only starting to understand details of the underlying micro-evolutionary processes occurring at the germline genome level. These include segregation, recombination, mutation and selection and can occur at any stage during germline differentiation and mitotic germline proliferation to meiosis and post-meiotic gamete maturation. Selection acting on germ cells at any stage from the diploid germ cell to the haploid gametes may cause significant deviations from Mendelian inheritance and may be more widespread than previously assumed. The mechanisms that affect and potentially alter the genomic sequence and allele frequencies in the germline are pivotal to our understanding of heritability. With the rise of new sequencing technologies, we are now able to address some of these unanswered questions. In this review, we comment on the most recent developments in this field and identify current gaps in our knowledge.
Collapse
Affiliation(s)
- Roberta Bergero
- Institute of Evolutionary BiologyUniversity of EdinburghEdinburghEH9 3JTU.K.
| | - Peter Ellis
- School of BiosciencesUniversity of KentCanterburyCT2 7NJU.K.
| | | | - Lee Larcombe
- Applied Exomics LtdStevenage Bioscience CatalystStevenageSG1 2FXU.K.
| | - Iain Macaulay
- Earlham InstituteNorwich Research ParkNorwichNR4 7UZU.K.
| | - Tarang Mehta
- Earlham InstituteNorwich Research ParkNorwichNR4 7UZU.K.
| | - Mette Mogensen
- School of Biological SciencesUniversity of East AngliaNorwich Research ParkNorwichNR4 7TJU.K.
| | - David Murray
- School of Biological SciencesUniversity of East AngliaNorwich Research ParkNorwichNR4 7TJU.K.
| | - Will Nash
- Earlham InstituteNorwich Research ParkNorwichNR4 7UZU.K.
| | - Matthew J. Neale
- Genome Damage and Stability Centre, School of Life SciencesUniversity of SussexBrightonBN1 9RHU.K.
| | | | | | - Ned Peel
- Earlham InstituteNorwich Research ParkNorwichNR4 7UZU.K.
| | - Luke Ramsey
- The James Hutton InstituteInvergowrieDundeeDD2 5DAU.K.
| | - Ben Skinner
- School of Life SciencesUniversity of EssexColchesterCO4 3SQU.K.
| | - Alexander Suh
- School of Biological SciencesUniversity of East AngliaNorwich Research ParkNorwichNR4 7TJU.K.
- Department of Organismal BiologyUppsala UniversityNorbyvägen 18DUppsala752 36Sweden
| | - Michael Summers
- School of BiosciencesUniversity of KentCanterburyCT2 7NJU.K.
- The Bridge Centre1 St Thomas Street, London BridgeLondonSE1 9RYU.K.
| | - Yu Sun
- Norwich Medical SchoolUniversity of East AngliaNorwich Research Park, Colney LnNorwichNR4 7UGU.K.
| | - Alison Tidy
- School of BiosciencesUniversity of Nottingham, Plant Science, Sutton Bonington CampusSutton BoningtonLE12 5RDU.K.
| | | | - Claudia Rathje
- School of BiosciencesUniversity of KentCanterburyCT2 7NJU.K.
| | - Simone Immler
- School of Biological SciencesUniversity of East AngliaNorwich Research ParkNorwichNR4 7TJU.K.
| |
Collapse
|
24
|
Harris AM, DeGiorgio M. A Likelihood Approach for Uncovering Selective Sweep Signatures from Haplotype Data. Mol Biol Evol 2021; 37:3023-3046. [PMID: 32392293 PMCID: PMC7530616 DOI: 10.1093/molbev/msaa115] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Selective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverages the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole-genome polymorphism data sets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. We also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.
Collapse
Affiliation(s)
- Alexandre M Harris
- Department of Biology, Pennsylvania State University, University Park, PA.,Molecular, Cellular, and Integrative Biosciences, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
| |
Collapse
|
25
|
Ho EKH, Macrae F, Latta LC, McIlroy P, Ebert D, Fields PD, Benner MJ, Schaack S. High and Highly Variable Spontaneous Mutation Rates in Daphnia. Mol Biol Evol 2021; 37:3258-3266. [PMID: 32520985 PMCID: PMC7820357 DOI: 10.1093/molbev/msaa142] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The rate and spectrum of spontaneous mutations are critical parameters in basic and applied biology because they dictate the pace and character of genetic variation introduced into populations, which is a prerequisite for evolution. We use a mutation–accumulation approach to estimate mutation parameters from whole-genome sequence data from multiple genotypes from multiple populations of Daphnia magna, an ecological and evolutionary model system. We report extremely high base substitution mutation rates (µ-n,bs = 8.96 × 10−9/bp/generation [95% CI: 6.66–11.97 × 10−9/bp/generation] in the nuclear genome and µ-m,bs = 8.7 × 10−7/bp/generation [95% CI: 4.40–15.12 × 10−7/bp/generation] in the mtDNA), the highest of any eukaryote examined using this approach. Levels of intraspecific variation based on the range of estimates from the nine genotypes collected from three populations (Finland, Germany, and Israel) span 1 and 3 orders of magnitude, respectively, resulting in up to a ∼300-fold difference in rates among genomic partitions within the same lineage. In contrast, mutation spectra exhibit very consistent patterns across genotypes and populations, suggesting the mechanisms underlying the mutational process may be similar, even when the rates at which they occur differ. We discuss the implications of high levels of intraspecific variation in rates, the importance of estimating gene conversion rates using a mutation–accumulation approach, and the interacting factors influencing the evolution of mutation parameters. Our findings deepen our knowledge about mutation and provide both challenges to and support for current theories aimed at explaining the evolution of the mutation rate, as a trait, across taxa.
Collapse
Affiliation(s)
- Eddie K H Ho
- Department of Biology, Reed College, Portland, OR
| | | | - Leigh C Latta
- Department of Biology, Reed College, Portland, OR.,Division of Natural Sciences and Mathematics, Lewis-Clark State College, Lewiston, ID
| | | | - Dieter Ebert
- Department of Environmental Sciences, Zoology, University of Basel, Basel, Switzerland
| | - Peter D Fields
- Department of Environmental Sciences, Zoology, University of Basel, Basel, Switzerland
| | | | | |
Collapse
|
26
|
Browning SR, Browning BL. Probabilistic Estimation of Identity by Descent Segment Endpoints and Detection of Recent Selection. Am J Hum Genet 2020; 107:895-910. [PMID: 33053335 PMCID: PMC7553009 DOI: 10.1016/j.ajhg.2020.09.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 09/25/2020] [Indexed: 12/18/2022] Open
Abstract
Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
27
|
Wu FL, Strand AI, Cox LA, Ober C, Wall JD, Moorjani P, Przeworski M. A comparison of humans and baboons suggests germline mutation rates do not track cell divisions. PLoS Biol 2020; 18:e3000838. [PMID: 32804933 PMCID: PMC7467331 DOI: 10.1371/journal.pbio.3000838] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 09/02/2020] [Accepted: 07/28/2020] [Indexed: 12/19/2022] Open
Abstract
In humans, most germline mutations are inherited from the father. This observation has been widely interpreted as reflecting the replication errors that accrue during spermatogenesis. If so, the male bias in mutation should be substantially lower in a closely related species with similar rates of spermatogonial stem cell divisions but a shorter mean age of reproduction. To test this hypothesis, we resequenced two 3-4 generation nuclear families (totaling 29 individuals) of olive baboons (Papio anubis), who reproduce at approximately 10 years of age on average, and analyzed the data in parallel with three 3-generation human pedigrees (26 individuals). We estimated a mutation rate per generation in baboons of 0.57×10-8 per base pair, approximately half that of humans. Strikingly, however, the degree of male bias in germline mutations is approximately 4:1, similar to that of humans-indeed, a similar male bias is seen across mammals that reproduce months, years, or decades after birth. These results mirror the finding in humans that the male mutation bias is stable with parental ages and cast further doubt on the assumption that germline mutations track cell divisions. Our mutation rate estimates for baboons raise a further puzzle, suggesting a divergence time between apes and Old World monkeys of 65 million years, too old to be consistent with the fossil record; reconciling them now requires not only a slowdown of the mutation rate per generation in humans but also in baboons.
Collapse
Affiliation(s)
- Felix L. Wu
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, New York, United States of America
| | - Alva I. Strand
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| | - Laura A. Cox
- Center for Precision Medicine, Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
- Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, Texas, United States of America
| | - Carole Ober
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Jeffrey D. Wall
- Institute for Human Genetics, Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, California, United States of America
| | - Priya Moorjani
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| | - Molly Przeworski
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| |
Collapse
|
28
|
Ralph P, Thornton K, Kelleher J. Efficiently Summarizing Relationships in Large Samples: A General Duality Between Statistics of Genealogies and Genomes. Genetics 2020; 215:779-797. [PMID: 32357960 PMCID: PMC7337078 DOI: 10.1534/genetics.120.303253] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 04/28/2020] [Indexed: 12/11/2022] Open
Abstract
As a genetic mutation is passed down across generations, it distinguishes those genomes that have inherited it from those that have not, providing a glimpse of the genealogical tree relating the genomes to each other at that site. Statistical summaries of genetic variation therefore also describe the underlying genealogies. We use this correspondence to define a general framework that efficiently computes single-site population genetic statistics using the succinct tree sequence encoding of genealogies and genome sequence. The general approach accumulates sample weights within the genealogical tree at each position on the genome, which are then combined using a summary function; different statistics result from different choices of weight and function. Results can be reported in three ways: by site, which corresponds to statistics calculated as usual from genome sequence; by branch, which gives the expected value of the dual site statistic under the infinite sites model of mutation, and by node, which summarizes the contribution of each ancestor to these statistics. We use the framework to implement many currently defined statistics of genome sequence (making the statistics' relationship to the underlying genealogical trees concrete and explicit), as well as the corresponding branch statistics of tree shape. We evaluate computational performance using simulated data, and show that calculating statistics from tree sequences using this general framework is several orders of magnitude more efficient than optimized matrix-based methods in terms of both run time and memory requirements. We also explore how well the duality between site and branch statistics holds in practice on trees inferred from the 1000 Genomes Project data set, and discuss ways in which deviations may encode interesting biological signals.
Collapse
Affiliation(s)
- Peter Ralph
- Institute of Evolution and Ecology, Departments of Mathematics and Biology, University of Oregon, Eugene, Oregon 97405
| | - Kevin Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, United Kingdom OX3 7LF
| |
Collapse
|
29
|
Ovsyannikova IG, Haralambieva IH, Crooke SN, Poland GA, Kennedy RB. The role of host genetics in the immune response to SARS-CoV-2 and COVID-19 susceptibility and severity. Immunol Rev 2020; 296:205-219. [PMID: 32658335 PMCID: PMC7404857 DOI: 10.1111/imr.12897] [Citation(s) in RCA: 141] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 06/14/2020] [Indexed: 01/08/2023]
Abstract
This article provides a review of studies evaluating the role of host (and viral) genetics (including variation in HLA genes) in the immune response to coronaviruses, as well as the clinical outcome of coronavirus-mediated disease. The initial sections focus on seasonal coronaviruses, SARS-CoV, and MERS-CoV. We then examine the state of the knowledge regarding genetic polymorphisms and SARS-CoV-2 and COVID-19. The article concludes by discussing research areas with current knowledge gaps and proposes several avenues for future scientific exploration in order to develop new insights into the immunology of SARS-CoV-2.
Collapse
|
30
|
Carlson J, DeWitt WS, Harris K. Inferring evolutionary dynamics of mutation rates through the lens of mutation spectrum variation. Curr Opin Genet Dev 2020; 62:50-57. [PMID: 32619789 DOI: 10.1016/j.gde.2020.05.024] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 05/13/2020] [Accepted: 05/22/2020] [Indexed: 01/04/2023]
Abstract
There are many possible failure points in the transmission of genetic information that can produce heritable germline mutations. Once a mutation has been passed from parents to offspring for several generations, it can be difficult or impossible to identify its root cause; however, sometimes the nature of the ancestral and derived DNA sequences can provide mechanistic clues about a genetic change that happened hundreds or thousands of generations ago. Here, we review evidence that the sequence context 'spectrum' of germline mutagenesis has been evolving surprisingly rapidly over the history of humans and other species. We go on to discuss possible causal factors that might underlie rapid mutation spectrum evolution.
Collapse
Affiliation(s)
- Jedidiah Carlson
- Department of Genome Sciences, Foege Hall, University of Washington, Seattle, WA 98105, United States
| | - William S DeWitt
- Department of Genome Sciences, Foege Hall, University of Washington, Seattle, WA 98105, United States; Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Eastlake Ave E, Seattle, WA 98109, United States
| | - Kelley Harris
- Department of Genome Sciences, Foege Hall, University of Washington, Seattle, WA 98105, United States; Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Eastlake Ave E, Seattle, WA 98109, United States.
| |
Collapse
|
31
|
Extreme differences between human germline and tumor mutation densities are driven by ancestral human-specific deviations. Nat Commun 2020; 11:2512. [PMID: 32427823 PMCID: PMC7237693 DOI: 10.1038/s41467-020-16296-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 04/22/2020] [Indexed: 12/29/2022] Open
Abstract
Mutations do not accumulate uniformly across the genome. Human germline and tumor mutation density correlate poorly, and each is associated with different genomic features. Here, we use non-human great ape (NHGA) germlines to determine human germline- and tumor-specific deviations from an ancestral-like great ape genome-wide mutational landscape. Strikingly, we find that the distribution of mutation densities in tumors presents a stronger correlation with NHGA than with human germlines. This effect is driven by human-specific differences in the distribution of mutations at non-CpG sites. We propose that ancestral human demographic events, together with the human-specific mutation slowdown, disrupted the human genome-wide distribution of mutation densities. Tumors partially recover this distribution by accumulating preneoplastic-like somatic mutations. Our results highlight the potential utility of using NHGA population data, rather than human controls, to establish the expected mutational background of healthy somatic cells.
Collapse
|
32
|
Harris AM, DeGiorgio M. Identifying and Classifying Shared Selective Sweeps from Multilocus Data. Genetics 2020; 215:143-171. [PMID: 32152048 PMCID: PMC7198270 DOI: 10.1534/genetics.120.303137] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 02/29/2020] [Indexed: 11/18/2022] Open
Abstract
Positive selection causes beneficial alleles to rise to high frequency, resulting in a selective sweep of the diversity surrounding the selected sites. Accordingly, the signature of a selective sweep in an ancestral population may still remain in its descendants. Identifying signatures of selection in the ancestor that are shared among its descendants is important to contextualize the timing of a sweep, but few methods exist for this purpose. We introduce the statistic SS-H12, which can identify genomic regions under shared positive selection across populations and is based on the theory of the expected haplotype homozygosity statistic H12, which detects recent hard and soft sweeps from the presence of high-frequency haplotypes. SS-H12 is distinct from comparable statistics because it requires a minimum of only two populations, and properly identifies and differentiates between independent convergent sweeps and true ancestral sweeps, with high power and robustness to a variety of demographic models. Furthermore, we can apply SS-H12 in conjunction with the ratio of statistics we term [Formula: see text] and [Formula: see text] to further classify identified shared sweeps as hard or soft. Finally, we identified both previously reported and novel shared sweep candidates from human whole-genome sequences. Previously reported candidates include the well-characterized ancestral sweeps at LCT and SLC24A5 in Indo-Europeans, as well as GPHN worldwide. Novel candidates include an ancestral sweep at RGS18 in sub-Saharan Africans involved in regulating the platelet response and implicated in sudden cardiac death, and a convergent sweep at C2CD5 between European and East Asian populations that may explain their different insulin responses.
Collapse
Affiliation(s)
- Alexandre M Harris
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802
- Molecular, Cellular, and Integrative Biosciences at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida 33431
| |
Collapse
|
33
|
Zhou Y, Browning SR, Browning BL. A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data. Am J Hum Genet 2020; 106:426-437. [PMID: 32169169 PMCID: PMC7118582 DOI: 10.1016/j.ajhg.2020.02.010] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/12/2020] [Indexed: 12/24/2022] Open
Abstract
Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
34
|
Kessler MD, Loesch DP, Perry JA, Heard-Costa NL, Taliun D, Cade BE, Wang H, Daya M, Ziniti J, Datta S, Celedón JC, Soto-Quiros ME, Avila L, Weiss ST, Barnes K, Redline SS, Vasan RS, Johnson AD, Mathias RA, Hernandez R, Wilson JG, Nickerson DA, Abecasis G, Browning SR, Zöllner S, O'Connell JR, Mitchell BD, O'Connor TD. De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population. Proc Natl Acad Sci U S A 2020; 117:2560-2569. [PMID: 31964835 PMCID: PMC7007577 DOI: 10.1073/pnas.1902766117] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
De novo mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole-genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) Program, we called 93,325 single-nucleotide DNMs across 1,465 trios from an array of diverse human populations, and used them to directly estimate and analyze DNM counts, rates, and spectra. We find a significant positive correlation between local recombination rate and local DNM rate, and that DNM rate explains a substantial portion (8.98 to 34.92%, depending on the model) of the genome-wide variation in population-level genetic variation from 41K unrelated TOPMed samples. Genome-wide heterozygosity does correlate with DNM rate, but only explains <1% of variation. While we are underpowered to see small differences, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, we did find significantly fewer DNMs in Amish individuals, even when compared with other Europeans, and even after accounting for parental age and sequencing center. Specifically, we found significant reductions in the number of C→A and T→C mutations in the Amish, which seem to underpin their overall reduction in DNMs. Finally, we calculated near-zero estimates of narrow sense heritability (h2), which suggest that variation in DNM rate is significantly shaped by nonadditive genetic effects and the environment.
Collapse
Affiliation(s)
- Michael D Kessler
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- University of Maryland Marlene and Stewart Greenebaum Comprehensive Cancer Center, University of Maryland School of Medicine, Baltimore, MD 21201
| | - Douglas P Loesch
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
| | - James A Perry
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
| | - Nancy L Heard-Costa
- Department of Neurology, Boston University School of Medicine, Boston, MA 02118
- Framingham Heart Study, Framingham, MA 01702
| | - Daniel Taliun
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Brian E Cade
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142
| | - Heming Wang
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142
| | - Michelle Daya
- Department of Medicine, University of Colorado Denver, Aurora, CO 80045
| | - John Ziniti
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115
| | - Soma Datta
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115
| | - Juan C Celedón
- Division of Pediatric Pulmonary Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213
| | - Manuel E Soto-Quiros
- Department of Pediatrics, Hospital Nacional de Niños, 10103 San José, Costa Rica
| | - Lydiana Avila
- Department of Pediatrics, Hospital Nacional de Niños, 10103 San José, Costa Rica
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115
- Department of Medicine, Harvard Medical School, Boston, MA 02115
| | - Kathleen Barnes
- Department of Medicine, University of Colorado Denver, Aurora, CO 80045
| | - Susan S Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115
- Division of Sleep Medicine, Harvard Medical School, Boston, MA 02115
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215
| | | | - Andrew D Johnson
- Framingham Heart Study, Framingham, MA 01702
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702
| | - Rasika A Mathias
- Division of Allergy and Clinical Immunology, The Johns Hopkins School of Medicine, Baltimore, MD 21224
- Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, MD 21218
| | - Ryan Hernandez
- Quantitative Life Sciences, McGill University, Montreal, QC H3A OG4, Canada
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS 39216
| | | | - Goncalo Abecasis
- School of Public Health, University of Michigan, Ann Arbor, MI 48109
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109
| | - Jeffrey R O'Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Geriatrics Research and Education Clinical Center, Baltimore Veterans Administration Medical Center, Baltimore, MD 21201
| | - Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201;
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201
- University of Maryland Marlene and Stewart Greenebaum Comprehensive Cancer Center, University of Maryland School of Medicine, Baltimore, MD 21201
| |
Collapse
|
35
|
Finer S, Martin HC, Khan A, Hunt KA, MacLaughlin B, Ahmed Z, Ashcroft R, Durham C, MacArthur DG, McCarthy MI, Robson J, Trivedi B, Griffiths C, Wright J, Trembath RC, van Heel DA. Cohort Profile: East London Genes & Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people. Int J Epidemiol 2020; 49:20-21i. [PMID: 31504546 PMCID: PMC7124496 DOI: 10.1093/ije/dyz174] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/05/2019] [Indexed: 11/12/2022] Open
Affiliation(s)
- Sarah Finer
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Hilary C Martin
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Ahsan Khan
- London Borough of Waltham Forest, Waltham Forest Town Hall, Walthamstow, UK
| | - Karen A Hunt
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Beverley MacLaughlin
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Zaheer Ahmed
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | | | | | - Daniel G MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark I McCarthy
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Oxford, UK
- Oxford NIHR Biomedical Research Centre, Churchill Hospital, Oxford, UK
| | - John Robson
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Bhavi Trivedi
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Chris Griffiths
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - John Wright
- Bradford Institute for Health Research, Bradford Teaching Hospitals National Health Service (NHS) Foundation Trust, Bradford, UK
| | - Richard C Trembath
- School of Basic and Medical Biosciences, Faculty of Life Sciences and Medicine, King’s College London, London, UK
| | - David A van Heel
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| |
Collapse
|
36
|
Abstract
This chapter describes the usage of the program ARGweaver, which estimates the ancestral recombination graph for as many as about 100 genome sequences. The ancestral recombination graph is a detailed description of the coalescence and recombination events that define the relationships among the sampled sequences. This rich description is useful for a wide variety of population genetic analyses. We describe the preparation of data and major considerations for running ARGweaver, as well as the interpretation of results. We then demonstrate an analysis using the DARC (Duffy) gene as an example, and show how ARGweaver can be used to detect signatures of natural selection and Neandertal introgression, as well as to estimate the dates of mutation events. This chapter provides sufficient detail to get a new user up and running with this complex but powerful analysis tool.
Collapse
Affiliation(s)
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
37
|
Tian X, Browning BL, Browning SR. Estimating the Genome-wide Mutation Rate with Three-Way Identity by Descent. Am J Hum Genet 2019; 105:883-893. [PMID: 31587867 DOI: 10.1016/j.ajhg.2019.09.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 09/09/2019] [Indexed: 12/20/2022] Open
Abstract
The two primary methods for estimating the genome-wide mutation rate have been counting de novo mutations in parent-offspring trios and comparing sequence data between closely related species. With parent-offspring trio analysis it is difficult to control for genotype error, and resolution is limited because each trio provides information from only two meioses. Inter-species comparison is difficult to calibrate due to uncertainty in the number of meioses separating species, and it can be biased by selection and by changing mutation rates over time. An alternative class of approaches for estimating mutation rates that avoids these limitations is based on identity by descent (IBD) segments that arise from common ancestry within the past few thousand years. Existing IBD-based methods are limited to highly inbred samples, or lack robustness to genotype error and error in the estimated demographic history. We present an IBD-based method that uses sharing of IBD segments among sets of three individuals to estimate the mutation rate. Our method is applicable to accurately phased genotype data, such as parent-offspring trio data phased using Mendelian rules of inheritance. Unlike standard parent-offspring analysis, our method utilizes distant relationships and is robust to genotype error. We apply our method to data from 1,307 European-ancestry individuals in the Framingham Heart Study sequenced by the NHLBI TOPMed project. We obtain an estimate of 1.29 × 10-8 mutations per base pair per meiosis with a 95% confidence interval of [1.02 × 10-8, 1.56 × 10-8].
Collapse
|
38
|
Fuselli S. Beyond drugs: the evolution of genes involved in human response to medications. Proc Biol Sci 2019; 286:20191716. [PMID: 31640517 DOI: 10.1098/rspb.2019.1716] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The genetic variation of our species reflects human demographic history and adaptation to diverse local environments. Part of this genetic variation affects individual responses to exogenous substances, such as food, pollutants and drugs, and plays an important role in drug efficacy and safety. This review provides a synthesis of the evolution of loci implicated in human pharmacological response and metabolism, interpreted within the theoretical framework of population genetics and molecular evolution. In particular, I review and discuss key evolutionary aspects of different pharmacogenes in humans and other species, such as the relationship between the type of substrates and rate of evolution; the selective pressure exerted by landscape variables or dietary habits; expected and observed patterns of rare genetic variation. Finally, I discuss how this knowledge can be translated directly or after the implementation of specific studies, into practical guidelines.
Collapse
Affiliation(s)
- Silvia Fuselli
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| |
Collapse
|
39
|
Signatures of replication timing, recombination, and sex in the spectrum of rare variants on the human X chromosome and autosomes. Proc Natl Acad Sci U S A 2019; 116:17916-17924. [PMID: 31427530 PMCID: PMC6731651 DOI: 10.1073/pnas.1900714116] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The sources of human germline mutations are poorly understood. Part of the difficulty is that mutations occur very rarely, and so direct pedigree-based approaches remain limited in the numbers that they can examine. To address this problem, we consider the spectrum of low-frequency variants in a dataset (Genome Aggregation Database, gnomAD) of 13,860 human X chromosomes and autosomes. X-autosome differences are reflective of germline sex differences and have been used extensively to learn about male versus female mutational processes; what is less appreciated is that they also reflect chromosome-level biochemical features that differ between the X and autosomes. We tease these components apart by comparing the mutation spectrum in multiple genomic compartments on the autosomes and between the X and autosomes. In so doing, we are able to ascribe specific mutation patterns to replication timing and recombination and to identify differences in the types of mutations that accrue in males and females. In particular, we identify C > G as a mutagenic signature of male meiotic double-strand breaks on the X, which may result from late repair. Our results show how biochemical processes of damage and repair in the germline interact with sex-specific life history traits to shape mutation patterns on both the X chromosome and autosomes.
Collapse
|
40
|
Uspenskaya NY, Akopov SB, Snezhkov EV, Sverdlov ED. The Rate of Human Germline Mutations—Variable Factor of Evolution and Diseases. RUSS J GENET+ 2019. [DOI: 10.1134/s1022795419050144] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
41
|
Jacobs GS, Hudjashov G, Saag L, Kusuma P, Darusallam CC, Lawson DJ, Mondal M, Pagani L, Ricaut FX, Stoneking M, Metspalu M, Sudoyo H, Lansing JS, Cox MP. Multiple Deeply Divergent Denisovan Ancestries in Papuans. Cell 2019; 177:1010-1021.e32. [DOI: 10.1016/j.cell.2019.02.035] [Citation(s) in RCA: 137] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 01/07/2019] [Accepted: 02/21/2019] [Indexed: 12/29/2022]
|
42
|
Abstract
Mutation provides the ultimate source of all new alleles in populations, including variants that cause disease and fuel adaptation. Recent whole genome sequencing studies have uncovered variation in the mutation rate among individuals and differences in the relative frequency of specific nucleotide changes (the mutation spectrum) between populations. Although parental age is a major driver of differences in overall mutation rate among individuals, the causes of variation in the mutation spectrum remain less well understood. Here, I use high-quality whole genome sequences from 29 inbred laboratory mouse strains to explore the root causes of strain variation in the mutation spectrum. My analysis leverages the unique, mosaic patterns of genetic relatedness among inbred mouse strains to identify strain private variants residing on haplotypes shared between multiple strains due to their recent descent from a common ancestor. I show that these strain-private alleles are strongly enriched for recent de novo mutations and lack signals of widespread purifying selection, suggesting their faithful recapitulation of the spontaneous mutation landscape in single strains. The spectrum of strain-private variants varies significantly among inbred mouse strains reared under standardized laboratory conditions. This variation is not solely explained by strain differences in age at reproduction, raising the possibility that segregating genetic differences affect the constellation of new mutations that arise in a given strain. Collectively, these findings imply the action of remarkably precise nucleotide-specific genetic mechanisms for tuning the de novo mutation landscape in mammals and underscore the genetic complexity of mutation rate control.
Collapse
|
43
|
Abstract
Hominin evolution is characterized by progressive regional differentiation, as well as migration waves, leading to anatomically modern humans that are assumed to have emerged in Africa and spread over the whole world. Why or whether Africa was the source region of modern humans and what caused their spread remains subject of ongoing debate. We present a spatially explicit, stochastic numerical model that includes ongoing mutations, demic diffusion, assortative mating and migration waves. Diffusion and assortative mating alone result in a structured population with relatively homogeneous regions bound by sharp clines. The addition of migration waves results in a power-law distribution of wave areas: for every large wave, many more small waves are expected to occur. This suggests that one or more out-of-Africa migrations would probably have been accompanied by numerous smaller migration waves across the world. The migration waves are considered "spontaneous", as the current model excludes environmental or other extrinsic factors. Large waves preferentially emanate from the central areas of large, compact inhabited areas. During the Pleistocene, Africa was the largest such area most of the time, making Africa the statistically most likely origin of anatomically modern humans, without a need to invoke additional environmental or ecological drivers.
Collapse
|
44
|
Speevak M, DeMarco M, Wiebe N, Chapman K. An unusual case of alpha-1-antitrypsin deficiency: SZ/Z. Clin Biochem 2019; 64:49-52. [DOI: 10.1016/j.clinbiochem.2018.12.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 11/19/2018] [Accepted: 12/19/2018] [Indexed: 10/27/2022]
|
45
|
Henn BM, Steele TE, Weaver TD. Clarifying distinct models of modern human origins in Africa. Curr Opin Genet Dev 2018; 53:148-156. [PMID: 30423527 DOI: 10.1016/j.gde.2018.10.003] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 10/09/2018] [Accepted: 10/15/2018] [Indexed: 11/29/2022]
Abstract
Accumulating genomic, fossil and archaeological data from Africa have led to a renewed interest in models of modern human origins. However, such discussions are often discipline-specific, with limited integration of evidence across the different fields. Further, geneticists typically require explicit specification of parameters to test competing demographic models, but these have been poorly outlined for some scenarios. Here, we describe four possible models for the origins of Homo sapiens in Africa based on published literature from paleoanthropology and human genetics. We briefly outline expectations for data patterns under each model, with a special focus on genetic data. Additionally, we present schematics for each model, doing our best to qualitatively describe demographic histories for which genetic parameters can be specifically attached. Finally, it is our hope that this perspective provides context for discussions of human origins in other manuscripts presented in this special issue.
Collapse
Affiliation(s)
- Brenna M Henn
- Department of Anthropology, University of California, Davis, CA, 95616, United States; UC Davis Genome Center, University of California, Davis, CA, 95616, United States.
| | - Teresa E Steele
- Department of Anthropology, University of California, Davis, CA, 95616, United States
| | - Timothy D Weaver
- Department of Anthropology, University of California, Davis, CA, 95616, United States
| |
Collapse
|
46
|
Detection and Classification of Hard and Soft Sweeps from Unphased Genotypes by Multilocus Genotype Identity. Genetics 2018; 210:1429-1452. [PMID: 30315068 DOI: 10.1534/genetics.118.301502] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 10/08/2018] [Indexed: 11/18/2022] Open
Abstract
Positive natural selection can lead to a decrease in genomic diversity at the selected site and at linked sites, producing a characteristic signature of elevated expected haplotype homozygosity. These selective sweeps can be hard or soft. In the case of a hard selective sweep, a single adaptive haplotype rises to high population frequency, whereas multiple adaptive haplotypes sweep through the population simultaneously in a soft sweep, producing distinct patterns of genetic variation in the vicinity of the selected site. Measures of expected haplotype homozygosity have previously been used to detect sweeps in multiple study systems. However, these methods are formulated for phased haplotype data, typically unavailable for nonmodel organisms, and some may have reduced power to detect soft sweeps due to their increased genetic diversity relative to hard sweeps. To address these limitations, we applied the H12 and H2/H1 statistics proposed in 2015 by Garud et al., which have power to detect both hard and soft sweeps, to unphased multilocus genotypes, denoting them as G12 and G2/G1. G12 (and the more direct expected homozygosity analog to H12, denoted G123) has comparable power to H12 for detecting both hard and soft sweeps. G2/G1 can be used to classify hard and soft sweeps analogously to H2/H1, conditional on a genomic region having high G12 or G123 values. The reason for this power is that, under random mating, the most frequent haplotypes will yield the most frequent multilocus genotypes. Simulations based on parameters compatible with our recent understanding of human demographic history suggest that expected homozygosity methods are best suited for detecting recent sweeps, and increase in power under recent population expansions. Finally, we find candidates for selective sweeps within the 1000 Genomes CEU, YRI, GIH, and CHB populations, which corroborate and complement existing studies.
Collapse
|
47
|
Thornlow BP, Hough J, Roger JM, Gong H, Lowe TM, Corbett-Detig RB. Transfer RNA genes experience exceptionally elevated mutation rates. Proc Natl Acad Sci U S A 2018; 115:8996-9001. [PMID: 30127029 PMCID: PMC6130373 DOI: 10.1073/pnas.1801240115] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Transfer RNAs (tRNAs) are a central component for the biological synthesis of proteins, and they are among the most highly conserved and frequently transcribed genes in all living things. Despite their clear significance for fundamental cellular processes, the forces governing tRNA evolution are poorly understood. We present evidence that transcription-associated mutagenesis and strong purifying selection are key determinants of patterns of sequence variation within and surrounding tRNA genes in humans and diverse model organisms. Remarkably, the mutation rate at broadly expressed cytosolic tRNA loci is likely between 7 and 10 times greater than the nuclear genome average. Furthermore, evolutionary analyses provide strong evidence that tRNA genes, but not their flanking sequences, experience strong purifying selection acting against this elevated mutation rate. We also find a strong correlation between tRNA expression levels and the mutation rates in their immediate flanking regions, suggesting a simple method for estimating individual tRNA gene activity. Collectively, this study illuminates the extreme competing forces in tRNA gene evolution and indicates that mutations at tRNA loci contribute disproportionately to mutational load and have unexplored fitness consequences in human populations.
Collapse
Affiliation(s)
- Bryan P Thornlow
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064
| | - Josh Hough
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064
| | - Jacquelyn M Roger
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064
| | - Henry Gong
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064
| | - Todd M Lowe
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064;
- Genomics Institute, University of California, Santa Cruz, CA 95064
| | - Russell B Corbett-Detig
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064;
- Genomics Institute, University of California, Santa Cruz, CA 95064
| |
Collapse
|
48
|
Complex Haplotypes of GSTM1 Gene Deletions Harbor Signatures of a Selective Sweep in East Asian Populations. G3-GENES GENOMES GENETICS 2018; 8:2953-2966. [PMID: 30061374 PMCID: PMC6118300 DOI: 10.1534/g3.118.200462] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The deletion of the metabolizing Glutathione S-transferase Mu 1 (GSTM1) gene has been associated with multiple cancers, metabolic and autoimmune disorders, as well as drug response. It is unusually common, with allele frequency reaching up to 75% in some human populations. Such high allele frequency of a derived allele with apparent impact on an otherwise conserved gene is a rare phenomenon. To investigate the evolutionary history of this locus, we analyzed 310 genomes using population genetics tools. Our analysis revealed a surprising lack of linkage disequilibrium between the deletion and the flanking single nucleotide variants in this locus. Tests that measure extended homozygosity and rapid change in allele frequency revealed signatures of an incomplete sweep in the locus. Using empirical approaches, we identified the Tanuki haplogroup, which carries the GSTM1 deletion and is found in approximately 70% of East Asian chromosomes. This haplogroup has rapidly increased in frequency in East Asian populations, contributing to a high population differentiation among continental human groups. We showed that extended homozygosity and population differentiation for this haplogroup is incompatible with simulated neutral expectations in East Asian populations. In parallel, we found that the Tanuki haplogroup is significantly associated with the expression levels of other GSTM genes. Collectively, our results suggest that standing variation in this locus has likely undergone an incomplete sweep in East Asia with regulatory impact on multiple GSTM genes. Our study provides the necessary framework for further studies to elucidate the evolutionary reasons that maintain disease-susceptibility variants in the GSTM1 locus.
Collapse
|
49
|
Amorim A, Pinto N. Big data in forensic genetics. Forensic Sci Int Genet 2018; 37:102-105. [PMID: 30142461 DOI: 10.1016/j.fsigen.2018.08.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 07/23/2018] [Accepted: 08/01/2018] [Indexed: 12/16/2022]
Abstract
The potential and difficulties of the application of genome wide data in forensics are analyzed. We argue that, besides statistical, computational, ethical, economic and technical validation problems, the state of the art of population genetics theory is insufficient to deal with the forensic use of this type of data. In order to keep the current standards of quantifying and reporting genetic evidence, namely in kinship analyses and identification, substantial improvement in the theoretical framework should be reached, since to obtain genome-wide results is to provide the experts with data that they cannot quantify the corresponding evidentiary value. Therefore, while a satisfactory, generalized theoretical and biostatistical modelling is not achieved, it may well be wiser to improve the already established approaches to a limited, pre-defined number of validated genetic markers, amenable to a consensual handling and reporting. Whole genome population analyses will prove extremely useful in selecting the best suited and most efficient of those markers.
Collapse
Affiliation(s)
- António Amorim
- Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP), Porto, Portugal; Instituto de Investigação e Inovação em Saúde (i3s), Universidade do Porto, Porto, Portugal; Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Nadia Pinto
- Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP), Porto, Portugal; Instituto de Investigação e Inovação em Saúde (i3s), Universidade do Porto, Porto, Portugal; CMUP, Centro de Matemática da Universidade do Porto, Porto, Portugal.
| |
Collapse
|
50
|
Ramstetter MD, Shenoy SA, Dyer TD, Lehman DM, Curran JE, Duggirala R, Blangero J, Mezey JG, Williams AL. Inferring Identical-by-Descent Sharing of Sample Ancestors Promotes High-Resolution Relative Detection. Am J Hum Genet 2018; 103:30-44. [PMID: 29937093 PMCID: PMC6035284 DOI: 10.1016/j.ajhg.2018.05.008] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Accepted: 05/17/2018] [Indexed: 12/22/2022] Open
Abstract
As genetic datasets increase in size, the fraction of samples with one or more close relatives grows rapidly, resulting in sets of mutually related individuals. We present DRUID-deep relatedness utilizing identity by descent-a method that works by inferring the identical-by-descent (IBD) sharing profile of an ungenotyped ancestor of a set of close relatives. Using this IBD profile, DRUID infers relatedness between unobserved ancestors and more distant relatives, thereby combining information from multiple samples to remove one or more generations between the deep relationships to be identified. DRUID constructs sets of close relatives by detecting full siblings and also uses an approach to identify the aunts/uncles of two or more siblings, recovering 92.2% of real aunts/uncles with zero false positives. In real and simulated data, DRUID correctly infers up to 10.5% more relatives than PADRE when using data from two sets of distantly related siblings, and 10.7%-31.3% more relatives given two sets of siblings and their aunts/uncles. DRUID frequently infers relationships either correctly or within one degree of the truth, with PADRE classifying 43.3%-58.3% of tenth degree relatives in this way compared to 79.6%-96.7% using DRUID.
Collapse
Affiliation(s)
- Monica D Ramstetter
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Sushila A Shenoy
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Thomas D Dyer
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA and Edinburg, TX 78539, USA
| | - Donna M Lehman
- Department of Medicine, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Joanne E Curran
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA and Edinburg, TX 78539, USA
| | - Ravindranath Duggirala
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA and Edinburg, TX 78539, USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA and Edinburg, TX 78539, USA
| | - Jason G Mezey
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA; Department of Genetic Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Amy L Williams
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|