1
|
Murga-Moreno J, Casillas S, Barbadilla A, Uricchio L, Enard D. An efficient and robust ABC approach to infer the rate and strength of adaptation. G3 (BETHESDA, MD.) 2024; 14:jkae031. [PMID: 38365205 PMCID: PMC11090462 DOI: 10.1093/g3journal/jkae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 10/10/2023] [Accepted: 01/29/2024] [Indexed: 02/18/2024]
Abstract
Inferring the effects of positive selection on genomes remains a critical step in characterizing the ultimate and proximate causes of adaptation across species, and quantifying positive selection remains a challenge due to the confounding effects of many other evolutionary processes. Robust and efficient approaches for adaptation inference could help characterize the rate and strength of adaptation in nonmodel species for which demographic history, mutational processes, and recombination patterns are not currently well-described. Here, we introduce an efficient and user-friendly extension of the McDonald-Kreitman test (ABC-MK) for quantifying long-term protein adaptation in specific lineages of interest. We characterize the performance of our approach with forward simulations and find that it is robust to many demographic perturbations and positive selection configurations, demonstrating its suitability for applications to nonmodel genomes. We apply ABC-MK to the human proteome and a set of known virus interacting proteins (VIPs) to test the long-term adaptation in genes interacting with viruses. We find substantially stronger signatures of positive selection on RNA-VIPs than DNA-VIPs, suggesting that RNA viruses may be an important driver of human adaptation over deep evolutionary time scales.
Collapse
Affiliation(s)
- Jesús Murga-Moreno
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85719, USA
| | - Sònia Casillas
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Antonio Barbadilla
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | | | - David Enard
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85719, USA
| |
Collapse
|
2
|
Rodrigues MF, Kern AD, Ralph PL. Shared evolutionary processes shape landscapes of genomic variation in the great apes. Genetics 2024; 226:iyae006. [PMID: 38242701 PMCID: PMC10990428 DOI: 10.1093/genetics/iyae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 10/26/2023] [Accepted: 01/03/2024] [Indexed: 01/21/2024] Open
Abstract
For at least the past 5 decades, population genetics, as a field, has worked to describe the precise balance of forces that shape patterns of variation in genomes. The problem is challenging because modeling the interactions between evolutionary processes is difficult, and different processes can impact genetic variation in similar ways. In this paper, we describe how diversity and divergence between closely related species change with time, using correlations between landscapes of genetic variation as a tool to understand the interplay between evolutionary processes. We find strong correlations between landscapes of diversity and divergence in a well-sampled set of great ape genomes, and explore how various processes such as incomplete lineage sorting, mutation rate variation, GC-biased gene conversion and selection contribute to these correlations. Through highly realistic, chromosome-scale, forward-in-time simulations, we show that the landscapes of diversity and divergence in the great apes are too well correlated to be explained via strictly neutral processes alone. Our best fitting simulation includes both deleterious and beneficial mutations in functional portions of the genome, in which 9% of fixations within those regions is driven by positive selection. This study provides a framework for modeling genetic variation in closely related species, an approach which can shed light on the complex balance of forces that have shaped genetic variation.
Collapse
Affiliation(s)
- Murillo F Rodrigues
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Department of Biology, University of Oregon, Eugene, OR 97403, USA
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Department of Biology, University of Oregon, Eugene, OR 97403, USA
| | - Peter L Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Department of Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Mathematics, University of Oregon, Eugene, OR 97403, USA
| |
Collapse
|
3
|
Matheson J, Masel J. Background Selection From Unlinked Sites Causes Nonindependent Evolution of Deleterious Mutations. Genome Biol Evol 2024; 16:evae050. [PMID: 38482769 PMCID: PMC10972689 DOI: 10.1093/gbe/evae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/11/2024] [Indexed: 04/01/2024] Open
Abstract
Background selection describes the reduction in neutral diversity caused by selection against deleterious alleles at other loci. It is typically assumed that the purging of deleterious alleles affects linked neutral variants, and indeed simulations typically only treat a genomic window. However, background selection at unlinked loci also depresses neutral diversity. In agreement with previous analytical approximations, in our simulations of a human-like genome with a realistically high genome-wide deleterious mutation rate, the effects of unlinked background selection exceed those of linked background selection. Background selection reduces neutral genetic diversity by a factor that is independent of census population size. Outside of genic regions, the strength of background selection increases with the mean selection coefficient, contradicting the linked theory but in agreement with the unlinked theory. Neutral diversity within genic regions is fairly independent of the strength of selection. Deleterious genetic load among haploid individuals is underdispersed, indicating nonindependent evolution of deleterious mutations. Empirical evidence for underdispersion was previously interpreted as evidence for global epistasis, but we recover it from a non-epistatic model.
Collapse
Affiliation(s)
- Joseph Matheson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Department of Ecology, Behavior, and Evolution, University of California San Diego, San Diego, CA 92093, USA
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
4
|
Buffalo V, Kern AD. A quantitative genetic model of background selection in humans. PLoS Genet 2024; 20:e1011144. [PMID: 38507461 PMCID: PMC10984650 DOI: 10.1371/journal.pgen.1011144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 04/01/2024] [Accepted: 01/19/2024] [Indexed: 03/22/2024] Open
Abstract
Across the human genome, there are large-scale fluctuations in genetic diversity caused by the indirect effects of selection. This "linked selection signal" reflects the impact of selection according to the physical placement of functional regions and recombination rates along chromosomes. Previous work has shown that purifying selection acting against the steady influx of new deleterious mutations at functional portions of the genome shapes patterns of genomic variation. To date, statistical efforts to estimate purifying selection parameters from linked selection models have relied on classic Background Selection theory, which is only applicable when new mutations are so deleterious that they cannot fix in the population. Here, we develop a statistical method based on a quantitative genetics view of linked selection, that models how polygenic additive fitness variance distributed along the genome increases the rate of stochastic allele frequency change. By jointly predicting the equilibrium fitness variance and substitution rate due to both strong and weakly deleterious mutations, we estimate the distribution of fitness effects (DFE) and mutation rate across three geographically distinct human samples. While our model can accommodate weaker selection, we find evidence of strong selection operating similarly across all human samples. Although our quantitative genetic model of linked selection fits better than previous models, substitution rates of the most constrained sites disagree with observed divergence levels. We find that a model incorporating selective interference better predicts observed divergence in conserved regions, but overall our results suggest uncertainty remains about the processes generating fitness variation in humans.
Collapse
Affiliation(s)
- Vince Buffalo
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene, Oregon, United States of America
| | - Andrew D. Kern
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene, Oregon, United States of America
| |
Collapse
|
5
|
Cousins T, Tabin D, Patterson N, Reich D, Durvasula A. Accurate inference of population history in the presence of background selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576291. [PMID: 38313273 PMCID: PMC10838404 DOI: 10.1101/2024.01.18.576291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
All published methods for learning about demographic history make the simplifying assumption that the genome evolves neutrally, and do not seek to account for the effects of natural selection on patterns of variation. This is a major concern, as ample work has demonstrated the pervasive effects of natural selection and in particular background selection (BGS) on patterns of genetic variation in diverse species. Simulations and theoretical work have shown that methods to infer changes in effective population size over time (Ne(t)) become increasingly inaccurate as the strength of linked selection increases. Here, we introduce an extension to the Pairwise Sequentially Markovian Coalescent (PSMC) algorithm, PSMC+, which explicitly co-models demographic history and natural selection. We benchmark our method using forward-in-time simulations with BGS and find that our approach improves the accuracy of effective population size inference. Leveraging a high resolution map of BGS in humans, we infer considerable changes in the magnitude of inferred effective population size relative to previous reports. Finally, we separately infer Ne(t) on the X chromosome and on the autosomes in diverse great apes without making a correction for selection, and find that the inferred ratio fluctuates substantially through time in a way that differs across species, showing that uncorrected selection may be an important driver of signals of genetic difference on the X chromosome and autosomes.
Collapse
Affiliation(s)
- Trevor Cousins
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Daniel Tabin
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Boston, MA, USA
| | - Arun Durvasula
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
6
|
Vahedi SM, Salek Ardestani S, Banabazi MH, Clark KF. Strong selection signatures for Aleutian disease tolerance acting on novel candidate genes linked to immune and cellular responses in American mink (Neogale vison). Sci Rep 2024; 14:1035. [PMID: 38200094 PMCID: PMC10781757 DOI: 10.1038/s41598-023-51039-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/29/2023] [Indexed: 01/12/2024] Open
Abstract
Aleutian disease (AD) is a multi-systemic infectious disease in American mink (Neogale vison) caused by Aleutian mink disease virus (AMDV). This study aimed to identify candidate regions and genes underlying selection for response against AMDV using whole-genome sequence (WGS) data. Three case-control selection signatures studies were conducted between animals (N = 85) producing high versus low antibody levels against AMDV, grouped by counter immunoelectrophoresis (CIEP) test and two enzyme-linked immunosorbent assays (ELISA). Within each study, selection signals were detected using fixation index (FST) and nucleotide diversity (θπ ratios), and validated by cross-population extended haplotype homozygosity (XP-EHH) test. Within- and between-studies overlapping results were then evaluated. Within-studies overlapping results indicated novel candidate genes related to immune and cellular responses (e.g., TAP2, RAB32), respiratory system function (e.g., SPEF2, R3HCC1L), and reproduction system function (e.g., HSF2, CFAP206) in other species. Between-studies overlapping results identified three large segments under strong selection pressure, including two on chromosome 1 (chr1:88,770-98,281 kb and chr1:114,133-120,473) and one on chromosome 6 (chr6:37,953-44,279 kb). Within regions with strong signals, we found novel candidate genes involved in immune and cellular responses (e.g., homologous MHC class II genes, ITPR3, VPS52) in other species. Our study brings new insights into candidate regions and genes controlling AD response.
Collapse
Affiliation(s)
- Seyed Milad Vahedi
- Department of Animal Science and Aquaculture, Dalhousie University, Bible Hill, NS, B2N5E3, Canada
| | | | - Mohammad Hossein Banabazi
- Department of Animal Breeding and Genetics (HGEN), Centre for Veterinary Medicine and Animal Science (VHC), Swedish University of Agricultural Sciences (SLU), 75007, Uppsala, Sweden.
- Department of Biotechnology, Animal Science Research Institute of IRAN (ASRI),, Agricultural Research, Education & Extension Organization (AREEO), Karaj, 3146618361, Iran.
| | - K Fraser Clark
- Department of Animal Science and Aquaculture, Dalhousie University, Bible Hill, NS, B2N5E3, Canada.
| |
Collapse
|
7
|
Rodrigues MF, Kern AD, Ralph PL. Shared evolutionary processes shape landscapes of genomic variation in the great apes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.07.527547. [PMID: 36798346 PMCID: PMC9934647 DOI: 10.1101/2023.02.07.527547] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
For at least the past five decades population genetics, as a field, has worked to describe the precise balance of forces that shape patterns of variation in genomes. The problem is challenging because modelling the interactions between evolutionary processes is difficult, and different processes can impact genetic variation in similar ways. In this paper, we describe how diversity and divergence between closely related species change with time, using correlations between landscapes of genetic variation as a tool to understand the interplay between evolutionary processes. We find strong correlations between landscapes of diversity and divergence in a well sampled set of great ape genomes, and explore how various processes such as incomplete lineage sorting, mutation rate variation, GC-biased gene conversion and selection contribute to these correlations. Through highly realistic, chromosome-scale, forward-in-time simulations we show that the landscapes of diversity and divergence in the great apes are too well correlated to be explained via strictly neutral processes alone. Our best fitting simulation includes both deleterious and beneficial mutations in functional portions of the genome, in which 9% of fixations within those regions is driven by positive selection. This study provides a framework for modelling genetic variation in closely related species, an approach which can shed light on the complex balance of forces that have shaped genetic variation.
Collapse
Affiliation(s)
- Murillo F. Rodrigues
- Institute of Ecology and Evolution, University of Oregon
- Department of Biology, University of Oregon
| | - Andrew D. Kern
- Institute of Ecology and Evolution, University of Oregon
- Department of Biology, University of Oregon
| | - Peter L. Ralph
- Institute of Ecology and Evolution, University of Oregon
- Department of Biology, University of Oregon
- Department of Mathematics, University of Oregon
| |
Collapse
|
8
|
Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data. Mol Biol Evol 2023; 40:msad216. [PMID: 37772983 PMCID: PMC10581699 DOI: 10.1093/molbev/msad216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/10/2023] [Accepted: 09/14/2023] [Indexed: 09/30/2023] Open
Abstract
Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under nonconvex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data although preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx, which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
Collapse
Affiliation(s)
- Md Ruhul Amin
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Mahmudul Hasan
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
9
|
Murga-Moreno J, Casillas S, Barbadilla A, Uricchio L, Enard D. An efficient and robust ABC approach to infer the rate and strength of adaptation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.29.555322. [PMID: 37693550 PMCID: PMC10491248 DOI: 10.1101/2023.08.29.555322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Inferring the effects of positive selection on genomes remains a critical step in characterizing the ultimate and proximate causes of adaptation across species, and quantifying positive selection remains a challenge due to the confounding effects of many other evolutionary processes. Robust and efficient approaches for adaptation inference could help characterize the rate and strength of adaptation in non-model species for which demographic history, mutational processes, and recombination patterns are not currently well-described. Here, we introduce an efficient and user-friendly extension of the McDonald-Kreitman test (ABC-MK) for quantifying long-term protein adaptation in specific lineages of interest. We characterize the performance of our approach with forward simulations and find that it is robust to many demographic perturbations and positive selection configurations, demonstrating its suitability for applications to non-model genomes. We apply ABC-MK to the human proteome and a set of known Virus Interacting Proteins (VIPs) to test the long-term adaptation in genes interacting with viruses. We find substantially stronger signatures of positive selection on RNA-VIPs than DNA-VIPs, suggesting that RNA viruses may be an important driver of human adaptation over deep evolutionary time scales.
Collapse
Affiliation(s)
- Jesús Murga-Moreno
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, USA
| | - Sònia Casillas
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Antonio Barbadilla
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | | | - David Enard
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, USA
| |
Collapse
|
10
|
Qiao R, Li X, Madsen O, Groenen MAM, Xu P, Wang K, Han X, Li G, Li X, Li K. Potential selection for lipid kinase activity and spermatogenesis in Henan native pig breeds and growth shaping by introgression of European genes. Genet Sel Evol 2023; 55:64. [PMID: 37723431 PMCID: PMC10506266 DOI: 10.1186/s12711-023-00841-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 09/12/2023] [Indexed: 09/20/2023] Open
Abstract
BACKGROUND China has one third of the worldwide indigenous pig breeds. The Henan province is one of the earliest pig domestication centers of China (about 8000 years ago). However, the precise genetic characteristics of the Henan local pig breeds are still obscure. To understand the origin and the effects of selection on these breeds, we performed various analyses on lineage composition, genetic structure, and detection of selection sweeps and introgression in three of these breeds (Queshan, Nanyang and Huainan) using genotyping data on 125 Queshan, 75 Nanyang, 16 Huainan pigs and 878 individuals from 43 Eurasian pig breeds. RESULTS We found no clear evidence of ancestral domestic pig DNA lineage in the Henan local breeds, which have an extremely complicated genetic background. Not only do they share genes with some northern Chinese pig breeds, such as Erhualian, Hetaodaer, and Laiwu, but they also have a high admixture of genes from foreign pig breeds (33-40%). Two striking selection sweeps in small regions of chromosomes 2 and 14 common to the Queshan and Nanyang breeds were identified. The most significant enrichment was for lipid kinase activity (GO:0043550) with the genes FII, AMBRA1, and PIK3IP1. Another interesting 636.35-kb region on chromosome 14 contained a cluster of spermatogenesis genes (OSBP2, GAL3ST1, PLA2G3, LIMK2, and PATZ1), a bisexual sterility gene MORC2, and a fat deposition gene SELENOM. Reproduction and growth genes LRP4, FII, and ARHGAP1 were present in a 238.05-kb region on SSC2 under selection. We also identified five loci associated with body length (P = 0.004) on chromosomes 1 and 12 that were introgressed from foreign pig breeds into the Henan breeds. In addition, the Chinese indigenous pig breeds fell into four main types instead of the previously reported six, among which the Eastern type could be divided into two subgroups. CONCLUSIONS Admixture of North China, East China and foreign pigs contributed to high genetic diversity of Henan local pigs. Ontology terms associated with lipid kinase activity and spermatogenesis and growth shaping by introgression of European genes in Henan pigs were identified through selective sweep analyses.
Collapse
Affiliation(s)
- Ruimin Qiao
- College of Animal Science, Henan Agricultural University, Zhengzhou, 450046, China.
| | - Xinjian Li
- College of Animal Science, Henan Agricultural University, Zhengzhou, 450046, China
| | - Ole Madsen
- Animal Breeding and Genomics Centre, Department of Animal Sciences, Wageningen University & Research, 6700 HB, Wageningen, The Netherlands
| | - Martien A M Groenen
- Animal Breeding and Genomics Centre, Department of Animal Sciences, Wageningen University & Research, 6700 HB, Wageningen, The Netherlands
| | - Pan Xu
- Jiangsu Agri-Animal Husbandry and Veterinary College, Taizhou, 225300, China
| | - Kejun Wang
- College of Animal Science, Henan Agricultural University, Zhengzhou, 450046, China
| | - Xuelei Han
- College of Animal Science, Henan Agricultural University, Zhengzhou, 450046, China
| | - Gaiying Li
- College of Animal Science, Henan Agricultural University, Zhengzhou, 450046, China
| | - Xiuling Li
- College of Animal Science, Henan Agricultural University, Zhengzhou, 450046, China.
| | - Kui Li
- State Key Laboratory of Animal Nutrition and Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs of China, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| |
Collapse
|
11
|
Abondio P, Cilli E, Luiselli D. Human Pangenomics: Promises and Challenges of a Distributed Genomic Reference. Life (Basel) 2023; 13:1360. [PMID: 37374141 DOI: 10.3390/life13061360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/02/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
A pangenome is a collection of the common and unique genomes that are present in a given species. It combines the genetic information of all the genomes sampled, resulting in a large and diverse range of genetic material. Pangenomic analysis offers several advantages compared to traditional genomic research. For example, a pangenome is not bound by the physical constraints of a single genome, so it can capture more genetic variability. Thanks to the introduction of the concept of pangenome, it is possible to use exceedingly detailed sequence data to study the evolutionary history of two different species, or how populations within a species differ genetically. In the wake of the Human Pangenome Project, this review aims at discussing the advantages of the pangenome around human genetic variation, which are then framed around how pangenomic data can inform population genetics, phylogenetics, and public health policy by providing insights into the genetic basis of diseases or determining personalized treatments, targeting the specific genetic profile of an individual. Moreover, technical limitations, ethical concerns, and legal considerations are discussed.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Elisabetta Cilli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
12
|
Ahlquist KD, Sugden LA, Ramachandran S. Enabling interpretable machine learning for biological data with reliability scores. PLoS Comput Biol 2023; 19:e1011175. [PMID: 37235578 PMCID: PMC10249903 DOI: 10.1371/journal.pcbi.1011175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/08/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.
Collapse
Affiliation(s)
- K. D. Ahlquist
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, United States of America
| | - Lauren A. Sugden
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, Rhode Island, United States of America
- Data Science Initiative, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
13
|
Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor decomposition based feature extraction and classification to detect natural selection from genomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.27.527731. [PMID: 37034767 PMCID: PMC10081272 DOI: 10.1101/2023.03.27.527731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under non-convex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data while preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx , which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
Collapse
|
14
|
Abraham A, LaBella AL, Capra JA, Rokas A. Mosaic patterns of selection in genomic regions associated with diverse human traits. PLoS Genet 2022; 18:e1010494. [PMID: 36342969 PMCID: PMC9671423 DOI: 10.1371/journal.pgen.1010494] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 11/17/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
Natural selection shapes the genetic architecture of many human traits. However, the prevalence of different modes of selection on genomic regions associated with variation in traits remains poorly understood. To address this, we developed an efficient computational framework to calculate positive and negative enrichment of different evolutionary measures among regions associated with complex traits. We applied the framework to summary statistics from >900 genome-wide association studies (GWASs) and 11 evolutionary measures of sequence constraint, population differentiation, and allele age while accounting for linkage disequilibrium, allele frequency, and other potential confounders. We demonstrate that this framework yields consistent results across GWASs with variable sample sizes, numbers of trait-associated SNPs, and analytical approaches. The resulting evolutionary atlas maps diverse signatures of selection on genomic regions associated with complex human traits on an unprecedented scale. We detected positive enrichment for sequence conservation among trait-associated regions for the majority of traits (>77% of 290 high power GWASs), which included reproductive traits. Many traits also exhibited substantial positive enrichment for population differentiation, especially among hair, skin, and pigmentation traits. In contrast, we detected widespread negative enrichment for signatures of balancing selection (51% of GWASs) and absence of enrichment for evolutionary signals in regions associated with late-onset Alzheimer's disease. These results support a pervasive role for negative selection on regions of the human genome that contribute to variation in complex traits, but also demonstrate that diverse modes of evolution are likely to have shaped trait-associated loci. This atlas of evolutionary signatures across the diversity of available GWASs will enable exploration of the relationship between the genetic architecture and evolutionary processes in the human genome.
Collapse
Affiliation(s)
- Abin Abraham
- Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Abigail L. LaBella
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
- Evolutionary Studies Initiative, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, North Carolina, United States of America
- North Carolina Research Center, Kannapolis, North Carolina, United States of America
| | - John A. Capra
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, United States of America
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, United States of America
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
- Evolutionary Studies Initiative, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
15
|
Genetic Diversity and Selection Signatures in Jianchang Black Goats Revealed by Whole-Genome Sequencing Data. Animals (Basel) 2022; 12:ani12182365. [PMID: 36139225 PMCID: PMC9495118 DOI: 10.3390/ani12182365] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 08/28/2022] [Accepted: 09/07/2022] [Indexed: 11/17/2022] Open
Abstract
Understanding the genetic composition of indigenous goats is essential to promote the scientific conservation and sustainable utilization of these breeds. The Jianchang Black (JC) goat, a Chinese native breed, is solid black and exhibits crude feed tolerance, but is characterized by a low growth rate and small body size. Based on the whole-genome sequencing data for 30 JC, 41 Jintang Black (JT), and 40 Yunshang Black (YS) goats, and 21 Bezoar ibexes, here, we investigated the genetic composition of JC goats by conducting analyses of the population structure, runs of homozygosity (ROH), genomic inbreeding, and selection signature. Our results revealed that JT and YS showed a close genetic relationship with a non-negligible amount of gene flows but were genetically distant from JC, apart from Bezoars. An average of 2039 ROHs were present in the autosomal genome per individual. The ROH-based inbreeding estimates in JC goats generally showed moderate values ranging from 0.134 to 0.264, mainly due to rapid declines in the effective population size during recent generations. The annotated genes (e.g., IL2, IL7, and KIT) overlapping with ROH islands were significantly enriched in immune-related biological processes. Further, we found 61 genes (e.g., STIM1, MYO9A, and KHDRBS2) under positive selection in JC goats via three complementary approaches, which may underly genetic adaptations to local environmental conditions. Our findings provided references for the conservation and sustainable utilization of JC goats.
Collapse
|
16
|
Salek Ardestani S, Zandi MB, Vahedi SM, Janssens S. Population structure and genomic footprints of selection in five major Iranian horse breeds. Anim Genet 2022; 53:627-639. [PMID: 35919961 DOI: 10.1111/age.13243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 06/08/2022] [Accepted: 07/04/2022] [Indexed: 11/28/2022]
Abstract
The genetic structure and characteristics of Iranian native breeds are yet to be comprehensibly investigated and studied. Therefore, we employed genomic information of 364 Iranian native horses representing the Asil (n = 109), Caspian (n = 40), Dareshuri (n = 44), Kurdish (n = 95), and Turkoman (n = 76) breeds to reveal the genetic structure and characteristics. For these and 19 other horse breeds, principal component analysis, Bayesian model-based, Neighbor-Net, and bootstrap-based TreeMix approaches were applied to investigate and compare their genetic structure. Additionally, three haplotype-based methods including haplotype homozygosity pooled, integrated haplotype score, and number of segregating sites by length were applied to trace genomic footprints of selection of Asil, Caspian, Dareshuri, Kurdish, and Turkoman groups. Then, the Mahalanobis distance based on the negative-log10 rank-based P-values was estimated based on the haplotype homozygosity pooled, integrated haplotype score, and number of segregating sites by length values. Asil, Caspian, Dareshuri, Kurdish, and Turkoman can be categorized into five different genetic clusters. Based on the top 1% of Mahalanobis distance based on the negative-log10 rank-based P-values of SNPs, we identified 24 SNPs formerly reported to be associated with different traits and >100 genes undergoing selection pressures in Asil, Caspian, Dareshuri, Kurdish, and Turkoman. The detected QTL undergoing selection pressures were associated with withers height, equine metabolic syndrome, overall body size, insect bite hypersensitivity, guttural pouch tympany, white markings, Rhodococcus equi infection, jumping test score, alternate gaits, and body weight traits. Our findings will aid to have a better perspective of the genetic characteristics and population structure of Asil, Caspian, Dareshuri, Kurdish, and Turkoman horses as Iranian native horse breeds.
Collapse
Affiliation(s)
| | | | - Seyed Milad Vahedi
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, Nova Scotia, Canada
| | - Steven Janssens
- Department Biosystems, Center Animal Breeding and Genetics, KU Leuven, Leuven, Belgium
| |
Collapse
|
17
|
Johri P, Eyre-Walker A, Gutenkunst RN, Lohmueller KE, Jensen JD. On the prospect of achieving accurate joint estimation of selection with population history. Genome Biol Evol 2022; 14:6604401. [PMID: 35675379 PMCID: PMC9254643 DOI: 10.1093/gbe/evac088] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2022] [Indexed: 11/15/2022] Open
Abstract
As both natural selection and population history can affect genome-wide patterns of variation, disentangling the contributions of each has remained as a major challenge in population genetics. We here discuss historical and recent progress towards this goal—highlighting theoretical and computational challenges that remain to be addressed, as well as inherent difficulties in dealing with model complexity and model violations—and offer thoughts on potentially fruitful next steps.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | | | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA.,Department of Human Genetics, University of California, Los Angeles, CA, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
18
|
Vahedi SM, Salek Ardestani S, Pahlevan Afshari K, Ghoreishifar SM, Moghaddaszadeh-Ahrabi S, Banabazi MH, Brito LF. Genome-Wide Selection Signatures and Human-Mediated Introgression Events in Bos taurus indicus-influenced Composite Beef Cattle. Front Genet 2022; 13:844653. [PMID: 35719394 PMCID: PMC9201998 DOI: 10.3389/fgene.2022.844653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 02/09/2022] [Indexed: 11/13/2022] Open
Abstract
Genetic introgression from interbreeding hybridization of European Bos taurus taurus (EBT) and Indian Bos taurus indicus (IBI) cattle breeds have been widely used to combine the climatic resilience of the IBI cattle and the higher productivity of EBT when forming new composite beef cattle (CB) populations. The subsequent breeding strategies have shifted their initial genomic compositions. To uncover population structure, signatures of selection, and potential introgression events in CB populations, high-density genotypes [containing 492,954 single nucleotide polymorphisms (SNPs) after the quality control] of 486 individuals from 15 cattle breeds, including EBT, IBI, and CB populations, along with two Bos grunniens genotypes as outgroup were used in this study. Then, in-depth population genetics analyses were performed for three CB breeds of Beefmaster, Brangus, and Santa Gertrudis. Neighbor-joining, principal components, and admixture analyses confirmed the historical introgression of EBT and IBI haplotypes into CB breeds. The fdM statistics revealed that only 12.9% of CB populations' genetic components are of IBI origin. The results of signatures of selection analysis indicated different patterns of selection signals in the three CB breeds with primary pressure on pathways involved in protein processing and stress response in Beefmaster, cell proliferation regulation and immune response in Brangus, and amino acids and glucose metabolisms in Santa Gertrudis. An average of >90% of genomic regions underlying selection signatures were of EBT origin in the studied CB populations. Investigating the CB breeds' genome allows the estimation of EBT and IBI ancestral proportions and the locations within the genome where either taurine or indicine origin alleles are under selective pressure. Such findings highlight various opportunities to control the selection process more efficiently and explore complementarity at the genomic level in CB populations.
Collapse
Affiliation(s)
- Seyed Milad Vahedi
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| | - Siavash Salek Ardestani
- Department of Animal Science, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Kian Pahlevan Afshari
- Department of Animal Sciences, Islamic Azad University, Varamin-Pishva Branch, Varamin, Iran
| | - Seyed Mohammad Ghoreishifar
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
| | - Sima Moghaddaszadeh-Ahrabi
- Department of Animal Science, Faculty of Agriculture and Natural Resources, Islamic Azad University, Tabriz Branch, Tabriz, Iran
| | - Mohammad Hossein Banabazi
- Department of Animal Breeding and Genetics (HGEN), Centre for Veterinary Medicine and Animal Science (VHC), Swedish University of Agricultural Sciences (SLU), Uppsala, Sweden
| | - Luiz Fernando Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
19
|
Johri P, Aquadro CF, Beaumont M, Charlesworth B, Excoffier L, Eyre-Walker A, Keightley PD, Lynch M, McVean G, Payseur BA, Pfeifer SP, Stephan W, Jensen JD. Recommendations for improving statistical inference in population genomics. PLoS Biol 2022; 20:e3001669. [PMID: 35639797 PMCID: PMC9154105 DOI: 10.1371/journal.pbio.3001669] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The field of population genomics has grown rapidly in response to the recent advent of affordable, large-scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population genetic insights outpaced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous nonadaptive processes is a prerequisite for accurate inference. We here describe the perils of these tendencies, present our consensus views on current best practices in population genomic data analysis, and highlight areas of statistical inference and theory that are in need of further attention. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Charles F. Aquadro
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Mark Beaumont
- School of Biological Sciences, University of Bristol, Bristol, United Kingdom
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne, Switzerland
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Peter D. Keightley
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Michael Lynch
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Bret A. Payseur
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Susanne P. Pfeifer
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | | | - Jeffrey D. Jensen
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
- * E-mail:
| |
Collapse
|
20
|
Rech GE, Radío S, Guirao-Rico S, Aguilera L, Horvath V, Green L, Lindstadt H, Jamilloux V, Quesneville H, González J. Population-scale long-read sequencing uncovers transposable elements associated with gene expression variation and adaptive signatures in Drosophila. Nat Commun 2022; 13:1948. [PMID: 35413957 PMCID: PMC9005704 DOI: 10.1038/s41467-022-29518-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 03/15/2022] [Indexed: 12/16/2022] Open
Abstract
High quality reference genomes are crucial to understanding genome function, structure and evolution. The availability of reference genomes has allowed us to start inferring the role of genetic variation in biology, disease, and biodiversity conservation. However, analyses across organisms demonstrate that a single reference genome is not enough to capture the global genetic diversity present in populations. In this work, we generate 32 high-quality reference genomes for the well-known model species D. melanogaster and focus on the identification and analysis of transposable element variation as they are the most common type of structural variant. We show that integrating the genetic variation across natural populations from five climatic regions increases the number of detected insertions by 58%. Moreover, 26% to 57% of the insertions identified using long-reads were missed by short-reads methods. We also identify hundreds of transposable elements associated with gene expression variation and new TE variants likely to contribute to adaptive evolution in this species. Our results highlight the importance of incorporating the genetic variation present in natural populations to genomic studies, which is essential if we are to understand how genomes function and evolve. Even in well-studied species, there is still substantial natural genetic variation that has not been characterized. Here, the authors use long read sequencing to discover transposable elements in the Drosophila genome not detected by short read sequencing, and link them to gene expression.
Collapse
Affiliation(s)
- Gabriel E Rech
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Santiago Radío
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Sara Guirao-Rico
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Laura Aguilera
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Vivien Horvath
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Llewellyn Green
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Hannah Lindstadt
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | | | | | - Josefa González
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain.
| |
Collapse
|
21
|
DeGiorgio M, Szpiech ZA. A spatially aware likelihood test to detect sweeps from haplotype distributions. PLoS Genet 2022; 18:e1010134. [PMID: 35404934 PMCID: PMC9022890 DOI: 10.1371/journal.pgen.1010134] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 04/21/2022] [Accepted: 03/04/2022] [Indexed: 01/13/2023] Open
Abstract
The inference of positive selection in genomes is a problem of great interest in evolutionary genomics. By identifying putative regions of the genome that contain adaptive mutations, we are able to learn about the biology of organisms and their evolutionary history. Here we introduce a composite likelihood method that identifies recently completed or ongoing positive selection by searching for extreme distortions in the spatial distribution of the haplotype frequency spectrum along the genome relative to the genome-wide expectation taken as neutrality. Furthermore, the method simultaneously infers two parameters of the sweep: the number of sweeping haplotypes and the “width” of the sweep, which is related to the strength and timing of selection. We demonstrate that this method outperforms the leading haplotype-based selection statistics, though strong signals in low-recombination regions merit extra scrutiny. As a positive control, we apply it to two well-studied human populations from the 1000 Genomes Project and examine haplotype frequency spectrum patterns at the LCT and MHC loci. We also apply it to a data set of brown rats sampled in NYC and identify genes related to olfactory perception. To facilitate use of this method, we have implemented it in user-friendly open source software. Identifying regions of the genome that contain adaptive variation is of fundamental interest in evolutionary biology, providing insight into an organism’s history and biology. When positive selection is recent or ongoing, we expect to find genomic patterns such as high frequency haplotypes and low genetic diversity in the vicinity of the adaptive locus. Here we develop a statistic to identify these regions based on distortions of the haplotype frequency spectrum from a background distribution. We evaluate the performance of this statistic under numerous realistic settings of interest to empiricists and demonstrate its superior performance relative to other haplotype-based selection statistics. We also apply this statistic to real population-genetic data. As a positive control, we explore two well-studied loci, LCT and MHC, in a European and an African human population that show strong evidence for selection. We also apply this statistic to the genomes of an urban brown rat population, where we uncover evidence for adaptation in olfactory perception genes. We release user-friendly software implementing this statistic.
Collapse
Affiliation(s)
- Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida, United States of America
- * E-mail: (MD); (ZAS)
| | - Zachary A. Szpiech
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
- * E-mail: (MD); (ZAS)
| |
Collapse
|
22
|
Vahedi SM, Salek Ardestani S, Karimi K, Banabazi MH. Weighted single-step GWAS for body mass index and scans for recent signatures of selection in Yorkshire pigs. J Hered 2022; 113:325-335. [DOI: 10.1093/jhered/esac004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 01/24/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Controlling extra fat deposition is economically favorable in modern swine industry. Understanding the genetic architecture of fat deposition traits such as body mass index (BMI) can help in improving genomic selection for such traits. We utilized a weighted single-step genome-wide association study (WssGWAS) to detect genetic regions and candidate genes associated with BMI in a Yorkshire pig population. Three extended haplotype homozygosity (EHH)-related statistics were also incorporated within a de-correlated composite of multiple signals (DCMS) framework to detect recent selection signatures signals. Overall, the full pedigree consisted of 7,016 pigs, of which 5,561 had BMI records and 598 pigs were genotyped with an 80 K single nucleotide polymorphism (SNP) array. Results showed that the most significant windows (top 15) explained 9.35% of BMI genetic variance. Several genes were detected in regions previously associated with pig fat deposition traits and treated as potential candidate genes for BMI in Yorkshire pigs: FTMT, SRFBP1, KHDRBS3, FOXG1, SOD3, LRRC32, TSKU, ACER3, B3GNT6, CCDC201, ADCY1, RAMP3, TBRG4, CCM2. Signature of selection analysis revealed multiple candidate genes previously associated with various economic traits. However, BMI genetic variance explained by regions under selection pressure was minimal (1.31%). In conclusion, candidate genes associated with Yorkshire pigs’ BMI trait were identified using WssGWAS. Gene enrichment analysis indicated that the identified candidate genes were enriched in the insulin secretion pathway. We anticipate that these results further advance our understanding of the genetic architecture of BMI in Yorkshire pigs and provide information for genomic selection for fat deposition in this breed.
Collapse
Affiliation(s)
- Seyed Milad Vahedi
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| | | | - Karim Karimi
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| | - Mohammad Hossein Banabazi
- Department of Biotechnology, Animal Science Research Institute of Iran, Agricultural Research, Education & Extension Organization, Karaj, Iran
- Department of animal breeding and genetics (HGEN), Centre for Veterinary Medicine and Animal Science (VHC), Swedish University of Agricultural Sciences (SLU), Uppsala, Sweden
| |
Collapse
|
23
|
Zee MJ, Whiting JR, Paris JR, Bassar RD, Travis J, Weigel D, Reznick DN, Fraser BA. Rapid genomic convergent evolution in experimental populations of Trinidadian guppies (
Poecilia reticulata
). Evol Lett 2022; 6:149-161. [PMID: 35386829 PMCID: PMC8966473 DOI: 10.1002/evl3.272] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 12/08/2021] [Accepted: 12/14/2021] [Indexed: 01/14/2023] Open
Affiliation(s)
- Mijke J. Zee
- Biosciences University of Exeter Exeter EX4 4QD United Kingdom
| | | | | | - Ron D. Bassar
- Department of Biology Williams College Williamstown Massachusetts 01267
| | - Joseph Travis
- Department of Biological Science Florida State University Tallahassee Florida 32306
| | - Detlef Weigel
- Department of Molecular Biology Max Planck Institute for Developmental Biology Tübingen 72076 Germany
| | - David N. Reznick
- Department of Biology University of California, Riverside Riverside California 92521
| | | |
Collapse
|
24
|
Abstract
The nearly neutral theory is a common framework to describe natural selection at the molecular level. This theory emphasizes the importance of slightly deleterious mutations by recognizing their ability to segregate and eventually get fixed due to genetic drift in spite of the presence of purifying selection. As genetic drift is stronger in smaller than in larger populations, a correlation between population size and molecular measures of natural selection is expected within the nearly neutral theory. However, this hypothesis was originally formulated under equilibrium conditions. As most natural populations are not in equilibrium, testing the relationship empirically may lead to confounded outcomes. Demographic nonequilibria, for instance following a change in population size, are common scenarios that are expected to push the selection–drift relationship off equilibrium. By explicitly modeling the effects of a change in population size on allele frequency trajectories in the Poisson random field framework, we obtain analytical solutions of the nonstationary allele frequency spectrum. This enables us to derive exact results of measures of natural selection and effective population size in a demographic nonequilibrium. The study of their time-dependent relationship reveals a substantial deviation from the equilibrium selection–drift balance after a change in population size. Moreover, we show that the deviation is sensitive to the combination of different measures. These results therefore constitute relevant tools for empirical studies to choose suitable measures for investigating the selection–drift relationship in natural populations. Additionally, our new modeling approach extends existing population genetics theory and can serve as foundation for methodological developments.
Collapse
Affiliation(s)
- Rebekka Müller
- Department of Mathematics, Uppsala University, 752 37 Uppsala, Sweden
| | - Ingemar Kaj
- Department of Mathematics, Uppsala University, 752 37 Uppsala, Sweden
| | - Carina F. Mugal
- Department of Ecology and Genetics, Uppsala University, 752 36 Uppsala, Sweden
- Corresponding author: E-mail:
| |
Collapse
|
25
|
Sohail M, Izarraras-Gomez A, Ortega-Del Vecchyo D. Populations, Traits, and Their Spatial Structure in Humans. Genome Biol Evol 2021; 13:evab272. [PMID: 34894236 PMCID: PMC8715524 DOI: 10.1093/gbe/evab272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2021] [Indexed: 11/16/2022] Open
Abstract
The spatial distribution of genetic variants is jointly determined by geography, past demographic processes, natural selection, and its interplay with environmental variation. A fraction of these genetic variants are "causal alleles" that affect the manifestation of a complex trait. The effect exerted by these causal alleles on complex traits can be independent or dependent on the environment. Understanding the evolutionary processes that shape the spatial structure of causal alleles is key to comprehend the spatial distribution of complex traits. Natural selection, past population size changes, range expansions, consanguinity, assortative mating, archaic introgression, admixture, and the environment can alter the frequencies, effect sizes, and heterozygosities of causal alleles. This provides a genetic axis along which complex traits can vary. However, complex traits also vary along biogeographical and sociocultural axes which are often correlated with genetic axes in complex ways. The purpose of this review is to consider these genetic and environmental axes in concert and examine the ways they can help us decipher the variation in complex traits that is visible in humans today. This initiative necessarily implies a discussion of populations, traits, the ability to infer and interpret "genetic" components of complex traits, and how these have been impacted by adaptive events. In this review, we provide a history-aware discussion on these topics using both the recent and more distant past of our academic discipline and its relevant contexts.
Collapse
Affiliation(s)
- Mashaal Sohail
- Department of Human Genetics, University of Chicago, USA
- Centro de Ciencias Genómicas (CCG), Universidad Nacional Autónoma de México (UNAM), Cuernavaca, Morelos, México
| | - Alan Izarraras-Gomez
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), Universidad Nacional Autónoma de México (UNAM), Juriquilla, Querétaro, México
| | - Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), Universidad Nacional Autónoma de México (UNAM), Juriquilla, Querétaro, México
| |
Collapse
|
26
|
Villegas-Mirón P, Acosta S, Nye J, Bertranpetit J, Laayouni H. Chromosome X-wide Analysis of Positive Selection in Human Populations: Common and Private Signals of Selection and its Impact on Inactivated Genes and Enhancers. Front Genet 2021; 12:714491. [PMID: 34646300 PMCID: PMC8502928 DOI: 10.3389/fgene.2021.714491] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 09/08/2021] [Indexed: 01/22/2023] Open
Abstract
The ability of detecting adaptive (positive) selection in the genome has opened the possibility of understanding the genetic basis of population-specific adaptations genome-wide. Here, we present the analysis of recent selective sweeps, specifically in the X chromosome, in human populations from the third phase of the 1,000 Genomes Project using three different haplotype-based statistics. We describe instances of recent positive selection that fit the criteria of hard or soft sweeps, and detect a higher number of events among sub-Saharan Africans than non-Africans (Europe and East Asia). A global enrichment of neural-related processes is observed and numerous genes related to fertility appear among the top candidates, reflecting the importance of reproduction in human evolution. Commonalities with previously reported genes under positive selection are found, while particularly strong new signals are reported in specific populations or shared across different continental groups. We report an enrichment of signals in genes that escape X chromosome inactivation, which may contribute to the differentiation between sexes. We also provide evidence of a widespread presence of soft-sweep-like signatures across the chromosome and a global enrichment of highly scoring regions that overlap potential regulatory elements. Among these, enhancers-like signatures seem to present putative signals of positive selection which might be in concordance with selection in their target genes. Also, particularly strong signals appear in regulatory regions that show differential activities, which might point to population-specific regulatory adaptations.
Collapse
Affiliation(s)
- Pablo Villegas-Mirón
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona, Spain
| | - Sandra Acosta
- Department Pathology and Experimental Therapeutics, Medical School, University of Barcelona, Barcelona, Spain
| | - Jessica Nye
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona, Spain
| | - Jaume Bertranpetit
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona, Spain
| | - Hafid Laayouni
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona, Spain.,Bioinformatics Studies, ESCI-UPF, Barcelona, Spain
| |
Collapse
|
27
|
Garcia JA, Lohmueller KE. Negative linkage disequilibrium between amino acid changing variants reveals interference among deleterious mutations in the human genome. PLoS Genet 2021; 17:e1009676. [PMID: 34319975 PMCID: PMC8351996 DOI: 10.1371/journal.pgen.1009676] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 08/09/2021] [Accepted: 06/22/2021] [Indexed: 11/18/2022] Open
Abstract
Evolutionary forces like Hill-Robertson interference and negative epistasis can lead to deleterious mutations being found on distinct haplotypes. However, the extent to which these forces depend on the selection and dominance coefficients of deleterious mutations and shape genome-wide patterns of linkage disequilibrium (LD) in natural populations with complex demographic histories has not been tested. In this study, we first used forward-in-time simulations to predict how negative selection impacts LD. Under models where deleterious mutations have additive effects on fitness, deleterious variants less than 10 kb apart tend to be carried on different haplotypes relative to pairs of synonymous SNPs. In contrast, for recessive mutations, there is no consistent ordering of how selection coefficients affect LD decay, due to the complex interplay of different evolutionary effects. We then examined empirical data of modern humans from the 1000 Genomes Project. LD between derived alleles at nonsynonymous SNPs is lower compared to pairs of derived synonymous variants, suggesting that nonsynonymous derived alleles tend to occur on different haplotypes more than synonymous variants. This result holds when controlling for potential confounding factors by matching SNPs for frequency in the sample (allele count), physical distance, magnitude of background selection, and genetic distance between pairs of variants. Lastly, we introduce a new statistic HR(j) which allows us to detect interference using unphased genotypes. Application of this approach to high-coverage human genome sequences confirms our finding that nonsynonymous derived alleles tend to be located on different haplotypes more often than are synonymous derived alleles. Our findings suggest that interference may play a pervasive role in shaping patterns of LD between deleterious variants in the human genome, and consequently influences genome-wide patterns of LD.
Collapse
Affiliation(s)
- Jesse A. Garcia
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| |
Collapse
|
28
|
Long X, Xue H. Genetic-variant hotspots and hotspot clusters in the human genome facilitating adaptation while increasing instability. Hum Genomics 2021; 15:19. [PMID: 33741065 PMCID: PMC7976700 DOI: 10.1186/s40246-021-00318-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Accepted: 03/04/2021] [Indexed: 12/25/2022] Open
Abstract
Background Genetic variants, underlining phenotypic diversity, are known to distribute unevenly in the human genome. A comprehensive understanding of the distributions of different genetic variants is important for insights into genetic functions and disorders. Methods Herein, a sliding-window scan of regional densities of eight kinds of germline genetic variants, including single-nucleotide-polymorphisms (SNPs) and four size-classes of copy-number-variations (CNVs) in the human genome has been performed. Results The study has identified 44,379 hotspots with high genetic-variant densities, and 1135 hotspot clusters comprising more than one type of hotspots, accounting for 3.1% and 0.2% of the genome respectively. The hotspots and clusters are found to co-localize with different functional genomic features, as exemplified by the associations of hotspots of middle-size CNVs with histone-modification sites, work with balancing and positive selections to meet the need for diversity in immune proteins, and facilitate the development of sensory-perception and neuroactive ligand-receptor interaction pathways in the function-sparse late-replicating genomic sequences. Genetic variants of different lengths co-localize with retrotransposons of different ages on a “long-with-young” and “short-with-all” basis. Hotspots and clusters are highly associated with tumor suppressor genes and oncogenes (p < 10−10), and enriched with somatic tumor CNVs and the trait- and disease-associated SNPs identified by genome-wise association studies, exceeding tenfold enrichment in clusters comprising SNPs and extra-long CNVs. Conclusions In conclusion, the genetic-variant hotspots and clusters represent two-edged swords that spearhead both positive and negative genomic changes. Their strong associations with complex traits and diseases also open up a potential “Common Disease-Hotspot Variant” approach to the missing heritability problem. Supplementary Information The online version contains supplementary material available at 10.1186/s40246-021-00318-3.
Collapse
Affiliation(s)
- Xi Long
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China.,HKUST Shenzhen Research Institute, 9 Yuexing First Road, Nanshan, Shenzhen, China
| | - Hong Xue
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China. .,HKUST Shenzhen Research Institute, 9 Yuexing First Road, Nanshan, Shenzhen, China. .,Centre for Cancer Genomics, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, Jiangsu, China.
| |
Collapse
|
29
|
Johri P, Riall K, Becher H, Excoffier L, Charlesworth B, Jensen JD. The Impact of Purifying and Background Selection on the Inference of Population History: Problems and Prospects. Mol Biol Evol 2021; 38:2986-3003. [PMID: 33591322 PMCID: PMC8233493 DOI: 10.1093/molbev/msab050] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Current procedures for inferring population history generally assume complete neutrality—that is, they neglect both direct selection and the effects of selection on linked sites. We here examine how the presence of direct purifying selection and background selection may bias demographic inference by evaluating two commonly-used methods (MSMC and fastsimcoal2), specifically studying how the underlying shape of the distribution of fitness effects and the fraction of directly selected sites interact with demographic parameter estimation. The results show that, even after masking functional genomic regions, background selection may cause the mis-inference of population growth under models of both constant population size and decline. This effect is amplified as the strength of purifying selection and the density of directly selected sites increases, as indicated by the distortion of the site frequency spectrum and levels of nucleotide diversity at linked neutral sites. We also show how simulated changes in background selection effects caused by population size changes can be predicted analytically. We propose a potential method for correcting for the mis-inference of population growth caused by selection. By treating the distribution of fitness effect as a nuisance parameter and averaging across all potential realizations, we demonstrate that even directly selected sites can be used to infer demographic histories with reasonable accuracy.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Kellen Riall
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Hannes Becher
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
30
|
Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM, Pitsillides AN, LeFaive J, Lee SB, Tian X, Browning BL, Das S, Emde AK, Clarke WE, Loesch DP, Shetty AC, Blackwell TW, Smith AV, Wong Q, Liu X, Conomos MP, Bobo DM, Aguet F, Albert C, Alonso A, Ardlie KG, Arking DE, Aslibekyan S, Auer PL, Barnard J, Barr RG, Barwick L, Becker LC, Beer RL, Benjamin EJ, Bielak LF, Blangero J, Boehnke M, Bowden DW, Brody JA, Burchard EG, Cade BE, Casella JF, Chalazan B, Chasman DI, Chen YDI, Cho MH, Choi SH, Chung MK, Clish CB, Correa A, Curran JE, Custer B, Darbar D, Daya M, de Andrade M, DeMeo DL, Dutcher SK, Ellinor PT, Emery LS, Eng C, Fatkin D, Fingerlin T, Forer L, Fornage M, Franceschini N, Fuchsberger C, Fullerton SM, Germer S, Gladwin MT, Gottlieb DJ, Guo X, Hall ME, He J, Heard-Costa NL, Heckbert SR, Irvin MR, Johnsen JM, Johnson AD, Kaplan R, Kardia SLR, Kelly T, Kelly S, Kenny EE, Kiel DP, Klemmer R, Konkle BA, Kooperberg C, Köttgen A, Lange LA, Lasky-Su J, Levy D, Lin X, Lin KH, Liu C, Loos RJF, Garman L, Gerszten R, Lubitz SA, Lunetta KL, Mak ACY, Manichaikul A, Manning AK, Mathias RA, McManus DD, McGarvey ST, Meigs JB, Meyers DA, Mikulla JL, Minear MA, Mitchell BD, Mohanty S, Montasser ME, Montgomery C, Morrison AC, Murabito JM, Natale A, Natarajan P, Nelson SC, North KE, O'Connell JR, Palmer ND, Pankratz N, Peloso GM, Peyser PA, Pleiness J, Post WS, Psaty BM, Rao DC, Redline S, Reiner AP, Roden D, Rotter JI, Ruczinski I, Sarnowski C, Schoenherr S, Schwartz DA, Seo JS, Seshadri S, Sheehan VA, Sheu WH, Shoemaker MB, Smith NL, Smith JA, Sotoodehnia N, Stilp AM, Tang W, Taylor KD, Telen M, Thornton TA, Tracy RP, Van Den Berg DJ, Vasan RS, Viaud-Martinez KA, Vrieze S, Weeks DE, Weir BS, Weiss ST, Weng LC, Willer CJ, Zhang Y, Zhao X, Arnett DK, Ashley-Koch AE, Barnes KC, Boerwinkle E, Gabriel S, Gibbs R, Rice KM, Rich SS, Silverman EK, Qasba P, Gan W, Papanicolaou GJ, Nickerson DA, Browning SR, Zody MC, Zöllner S, Wilson JG, Cupples LA, Laurie CC, Jaquish CE, Hernandez RD, O'Connor TD, Abecasis GR. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 2021; 590:290-299. [PMID: 33568819 PMCID: PMC7875770 DOI: 10.1038/s41586-021-03205-y] [Citation(s) in RCA: 868] [Impact Index Per Article: 289.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 01/07/2021] [Indexed: 02/08/2023]
Abstract
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
Collapse
Affiliation(s)
- Daniel Taliun
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Daniel N Harris
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Michael D Kessler
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Jedidiah Carlson
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Zachary A Szpiech
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA, USA
| | - Raul Torres
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Sarah A Gagliano Taliun
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | | | | | - Hyun Min Kang
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | | | - Jonathon LeFaive
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Seung-Been Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Xiaowen Tian
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Brian L Browning
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Sayantan Das
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | | | | | - Douglas P Loesch
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Amol C Shetty
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Thomas W Blackwell
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Albert V Smith
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Quenna Wong
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Xiaoming Liu
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Dean M Bobo
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - François Aguet
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Alvaro Alonso
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | | | - Dan E Arking
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | - Paul L Auer
- Zilber School of Public Health, University of Wisconsin Milwaukee, Milwaukee, WI, USA
| | | | - R Graham Barr
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
- Department of Epidemiology, Columbia University Medical Center, New York, NY, USA
| | | | | | - Rebecca L Beer
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Emelia J Benjamin
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Lawrence F Bielak
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - John Blangero
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Michael Boehnke
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A Brody
- Department of Medicine, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Esteban G Burchard
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Brian E Cade
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - James F Casella
- Department of Pediatrics, Johns Hopkins University, Baltimore, MD, USA
- Division of Pediatric Hematology, Johns Hopkins University, Baltimore, MD, USA
| | - Brandon Chalazan
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Daniel I Chasman
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Mina K Chung
- Department of Cardiovascular Medicine, Heart & Vascular Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Clary B Clish
- Metabolomics Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Adolfo Correa
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
- Department of Pediatrics, University of Mississippi Medical Center, Jackson, MS, USA
- Department of Population Health Science, University of Mississippi Medical Center, Jackson, MS, USA
| | - Joanne E Curran
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Brian Custer
- Vitalant Research Institute, San Francisco, CA, USA
- Department of Laboratory Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Dawood Darbar
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Michelle Daya
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Dawn L DeMeo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Susan K Dutcher
- McDonnell Genome Institute, Washington University, St Louis, MO, USA
- Department of Genetics, Washington University, St Louis, MO, USA
| | - Patrick T Ellinor
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Leslie S Emery
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Celeste Eng
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Diane Fatkin
- Molecular Cardiology Division, Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Kensington, New South Wales, Australia
- Cardiology Department, St Vincent's Hospital, Darlinghurst, New South Wales, Australia
| | - Tasha Fingerlin
- National Jewish Health, Center for Genes, Environment and Health, Denver, CO, USA
| | - Lukas Forer
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Myriam Fornage
- Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Christian Fuchsberger
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
- Institute for Biomedicine, Eurac Research, Bolzano, Italy
| | - Stephanie M Fullerton
- Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Mark T Gladwin
- Pittsburgh Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Daniel J Gottlieb
- VA Boston Healthcare System, Boston, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Michael E Hall
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Jiang He
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
- Tulane University Translational Science Institute, Tulane University, New Orleans, LA, USA
| | - Nancy L Heard-Costa
- Framingham Heart Study, Framingham, MA, USA
- Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Susan R Heckbert
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jill M Johnsen
- Department of Medicine, University of Washington, Seattle, WA, USA
- Bloodworks Northwest Research Institute, Seattle, WA, USA
| | - Andrew D Johnson
- Framingham Heart Study, Framingham, MA, USA
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Framingham, MA, USA
| | - Robert Kaplan
- Albert Einstein College of Medicine, New York, NY, USA
| | - Sharon L R Kardia
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Tanika Kelly
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
| | - Shannon Kelly
- Department of Epidemiology, Vitalant Research Institute, San Francisco, CA, USA
- Department of Pediatrics, UCSF Benioff Children's Hospital, Oakland, CA, USA
- Division of Pediatric Hematology, UCSF Benioff Children's Hospital, Oakland, CA, USA
| | - Eimear E Kenny
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Douglas P Kiel
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Hinda and Arthur Marcus Institute for Aging Research, Hebrew SeniorLife, Boston, MA, USA
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Robert Klemmer
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Barbara A Konkle
- Department of Medicine, University of Washington, Seattle, WA, USA
- Bloodworks Northwest Research Institute, Seattle, WA, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Anna Köttgen
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD, USA
- Institute of Genetic Epidemiology, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Leslie A Lange
- Department of Medicine, University of Colorado at Denver, Aurora, CO, USA
| | - Jessica Lasky-Su
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Brigham and Women's Hospital, Boston, MA, USA
| | - Daniel Levy
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Framingham, MA, USA
| | - Xihong Lin
- Biostatistics and Statistics, Harvard University, Boston, MA, USA
| | - Keng-Han Lin
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lori Garman
- Department of Genes and Human Disease, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | | | | | - Kathryn L Lunetta
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Angel C Y Mak
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Alisa K Manning
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA, USA
- Metabolism Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Rasika A Mathias
- Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - David D McManus
- Cardiovascular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Stephen T McGarvey
- International Health Institute, Brown University, Providence, RI, USA
- Department of Epidemiology, Brown University, Providence, RI, USA
- Department of Anthropology, Brown University, Providence, RI, USA
| | - James B Meigs
- Division of General Internal Medicine, Massachusetts General Hospital, Harvard Medical School, The Broad Institute of MIT and Harvard, Boston, MA, USA
| | | | - Julie L Mikulla
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mollie A Minear
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Braxton D Mitchell
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Baltimore Veterans Administration Medical Center, Baltimore, MD, USA
| | - Sanghamitra Mohanty
- Texas Cardiac Arrhythmia Institute, St David's Medical Center, Austin, TX, USA
- Department of Internal Medicine, Dell Medical School, Austin, TX, USA
| | - May E Montasser
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Courtney Montgomery
- Department of Genes and Human Disease, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Joanne M Murabito
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Andrea Natale
- Texas Cardiac Arrhythmia Institute, St David's Medical Center, Austin, TX, USA
| | - Pradeep Natarajan
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Sarah C Nelson
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Jeffrey R O'Connell
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Patricia A Peyser
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Jacob Pleiness
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Wendy S Post
- Division of Cardiology, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Bruce M Psaty
- Department of Medicine, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Services, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - D C Rao
- Division of Biostatistics, Washington University in St Louis, St Louis, MO, USA
| | - Susan Redline
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Dan Roden
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Chloé Sarnowski
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Sebastian Schoenherr
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | | | - Jeong-Sun Seo
- Precision Medicine Center, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Macrogen Inc, Seoul, Republic of Korea
- Gong Wu Genomic Medicine Institute, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sudha Seshadri
- Framingham Heart Study, Framingham, MA, USA
- Glenn Biggs Institute for Alzheimer's and Neurodegenerative Diseases, University of Texas Health Sciences Center at San Antonio, San Antonio, TX, USA
| | - Vivien A Sheehan
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, USA
- Aflac Cancer and Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, GA, USA
| | - Wayne H Sheu
- Taichung Veterans General Hospital Taiwan, Taichung City, Taiwan
| | | | - Nicholas L Smith
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
- Seattle Epidemiologic Research and Information Center, Department of Veterans Affairs Office of Research and Development, Seattle, WA, USA
| | - Jennifer A Smith
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Nona Sotoodehnia
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Adrienne M Stilp
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Weihong Tang
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | | | | | - Russell P Tracy
- Department of Pathology & Laboratory Medicine, University of Vermont Larner College of Medicine, Burlington, VT, USA
| | - David J Van Den Berg
- Center for Genetic Epidemiology, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Ramachandran S Vasan
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | | | - Scott Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Daniel E Weeks
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bruce S Weir
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Scott T Weiss
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Brigham and Women's Hospital, Boston, MA, USA
| | | | - Cristen J Willer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Internal Medicine-Cardiology, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Yingze Zhang
- Pittsburgh Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xutong Zhao
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Donna K Arnett
- Department of Epidemiology, University of Kentucky, Lexington, KY, USA
| | - Allison E Ashley-Koch
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC, USA
| | - Kathleen C Barnes
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Eric Boerwinkle
- University of Texas Health Science Center at Houston, Houston, TX, USA
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Stacey Gabriel
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Richard Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Pankaj Qasba
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Weiniu Gan
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - George J Papanicolaou
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Deborah A Nickerson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Northwest Genomics Center, Seattle, WA, USA
- Brotman Baty Institute, Seattle, WA, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | | | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
- Framingham Heart Study, Framingham, MA, USA.
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Cashell E Jaquish
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada.
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
| | - Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA.
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA.
| | - Gonçalo R Abecasis
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
| |
Collapse
|
31
|
Johri P, Riall K, Becher H, Excoffier L, Charlesworth B, Jensen JD. The impact of purifying and background selection on the inference of population history: problems and prospects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021. [PMID: 33501439 PMCID: PMC7836109 DOI: 10.1101/2020.04.28.066365] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Current procedures for inferring population history generally assume complete neutrality - that is, they neglect both direct selection and the effects of selection on linked sites. We here examine how the presence of direct purifying selection and background selection may bias demographic inference by evaluating two commonly-used methods (MSMC and fastsimcoal2), specifically studying how the underlying shape of the distribution of fitness effects (DFE) and the fraction of directly selected sites interact with demographic parameter estimation. The results show that, even after masking functional genomic regions, background selection may cause the mis-inference of population growth under models of both constant population size and decline. This effect is amplified as the strength of purifying selection and the density of directly selected sites increases, as indicated by the distortion of the site frequency spectrum and levels of nucleotide diversity at linked neutral sites. We also show how simulated changes in background selection effects caused by population size changes can be predicted analytically. We propose a potential method for correcting for the mis-inference of population growth caused by selection. By treating the DFE as a nuisance parameter and averaging across all potential realizations, we demonstrate that even directly selected sites can be used to infer demographic histories with reasonable accuracy.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Kellen Riall
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Hannes Becher
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne 3012, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
32
|
Genome-Wide Detection of Selection Signatures in Duroc Revealed Candidate Genes Relating to Growth and Meat Quality. G3-GENES GENOMES GENETICS 2020; 10:3765-3773. [PMID: 32859686 PMCID: PMC7534417 DOI: 10.1534/g3.120.401628] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
With the development of high-throughput genotyping techniques, selection signatures in the genome of domestic pigs have been extensively interrogated in the last decade. The Duroc, a major commercial pig breed famous for its fast growth rate and high lean ratio, has not been extensively studied focusing on footprints of intensively artificial selection in their genomes by a lot of re-sequencing data. The goal of this study was to investigate genomic regions under artificial selection and their contribution to the unique phenotypic traits of the Duroc using whole-genome resequencing data from 97 pigs. Three complementary methods (di, CLR, and iHH12) were implemented for selection signature detection. In Total, 464 significant candidate regions were identified, which covered 46.4 Mb of the pig genome. Within the identified regions, 709 genes were annotated, including 600 candidate protein-coding genes (486 functionally annotated genes) and 109 lncRNA genes. Genes undergoing selective pressure were significantly enriched in the insulin resistance signaling pathway, which may partly explain the difference between the Duroc and other breeds in terms of growth rate. The selection signatures identified in the Duroc population demonstrated positive pressures on a set of important genes with potential functions that are involved in many biological processes. The results provide new insights into the genetic mechanisms of fast growth rate and high lean mass, and further facilitate follow-up studies on functional genes that contribute to the Duroc's excellent phenotypic traits.
Collapse
|
33
|
Schrider DR. Background Selection Does Not Mimic the Patterns of Genetic Diversity Produced by Selective Sweeps. Genetics 2020; 216:499-519. [PMID: 32847814 PMCID: PMC7536861 DOI: 10.1534/genetics.120.303469] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 08/04/2020] [Indexed: 12/28/2022] Open
Abstract
It is increasingly evident that natural selection plays a prominent role in shaping patterns of diversity across the genome. The most commonly studied modes of natural selection are positive selection and negative selection, which refer to directional selection for and against derived mutations, respectively. Positive selection can result in hitchhiking events, in which a beneficial allele rapidly replaces all others in the population, creating a valley of diversity around the selected site along with characteristic skews in allele frequencies and linkage disequilibrium among linked neutral polymorphisms. Similarly, negative selection reduces variation not only at selected sites but also at linked sites, a phenomenon called background selection (BGS). Thus, discriminating between these two forces may be difficult, and one might expect efforts to detect hitchhiking to produce an excess of false positives in regions affected by BGS. Here, we examine the similarity between BGS and hitchhiking models via simulation. First, we show that BGS may somewhat resemble hitchhiking in simplistic scenarios in which a region constrained by negative selection is flanked by large stretches of unconstrained sites, echoing previous results. However, this scenario does not mirror the actual spatial arrangement of selected sites across the genome. By performing forward simulations under more realistic scenarios of BGS, modeling the locations of protein-coding and conserved noncoding DNA in real genomes, we show that the spatial patterns of variation produced by BGS rarely mimic those of hitchhiking events. Indeed, BGS is not substantially more likely than neutrality to produce false signatures of hitchhiking. This holds for simulations modeled after both humans and Drosophila, and for several different demographic histories. These results demonstrate that appropriately designed scans for hitchhiking need not consider BGS's impact on false-positive rates. However, we do find evidence that BGS increases the false-negative rate for hitchhiking, an observation that demands further investigation.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514
| |
Collapse
|
34
|
Kartje ME, Jing P, Payseur BA. Weak Correlation between Nucleotide Variation and Recombination Rate across the House Mouse Genome. Genome Biol Evol 2020; 12:293-299. [PMID: 32108880 PMCID: PMC7186785 DOI: 10.1093/gbe/evaa045] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/25/2020] [Indexed: 01/01/2023] Open
Abstract
Positive selection and purifying selection reduce levels of variation at linked neutral loci. One consequence of these processes is that the amount of neutral diversity and the meiotic recombination rate are predicted to be positively correlated across the genome-a prediction met in some species but not others. To better document the prevalence of selection at linked sites, we used new and published whole-genome sequences to survey nucleotide variation in population samples of the western European house mouse (Mus musculus domesticus) from Germany, France, and Gough Island, a remote volcanic island in the south Atlantic. Correlations between sequence variation and recombination rates estimated independently from dense linkage maps were consistently very weak (ρ ≤ 0.06), though they exceeded conventional significance thresholds. This pattern persisted in comparisons between genomic regions with the highest and lowest recombination rates, as well as in models incorporating the density of transcribed sites, the density of CpG dinucleotides, and divergence between mouse and rat as covariates. We conclude that natural selection affects linked neutral variation in a restricted manner in the western European house mouse.
Collapse
Affiliation(s)
- Michael E Kartje
- Laboratory of Genetics, University of Wisconsin – Madison, Madison
| | - Peicheng Jing
- Laboratory of Genetics, University of Wisconsin – Madison, Madison
| | - Bret A Payseur
- Laboratory of Genetics, University of Wisconsin – Madison, Madison
| |
Collapse
|
35
|
Vicuña L, Fernandez MI, Vial C, Valdebenito P, Chaparro E, Espinoza K, Ziegler A, Bustamante A, Eyheramendy S. Adaptation to Extreme Environments in an Admixed Human Population from the Atacama Desert. Genome Biol Evol 2020; 11:2468-2479. [PMID: 31384924 PMCID: PMC6733355 DOI: 10.1093/gbe/evz172] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2019] [Indexed: 12/11/2022] Open
Abstract
Inorganic arsenic (As) is a toxic xenobiotic and carcinogen associated with severe health conditions. The urban population from the Atacama Desert in northern Chile was exposed to extremely high As levels (up to 600 µg/l) in drinking water between 1958 and 1971, leading to increased incidence of urinary bladder cancer (BC), skin cancer, kidney cancer, and coronary thrombosis decades later. Besides, the Andean Native-American ancestors of the Atacama population were previously exposed for millennia to elevated As levels in water (∼120 µg/l) for at least 5,000 years, suggesting adaptation to this selective pressure. Here, we performed two genome-wide selection tests—PBSn1 and an ancestry-enrichment test—in an admixed population from Atacama, to identify adaptation signatures to As exposure acquired before and after admixture with Europeans, respectively. The top second variant selected by PBSn1 was associated with LCE4A-C1orf68, a gene that may be involved in the immune barrier of the epithelium during BC. We performed association tests between the top PBSn1 hits and BC occurrence in our population. The strongest association (P = 0.012) was achieved by the LCE4A-C1orf68 variant. The ancestry-enrichment test detected highly significant signals (P = 1.3 × 10−9) mapping MAK16, a gene with important roles in ribosome biogenesis during the G1 phase of the cell cycle. Our results contribute to a better understanding of the genetic factors involved in adaptation to the pathophysiological consequences of As exposure.
Collapse
Affiliation(s)
- Lucas Vicuña
- Department of Statistics, Faculty of Mathematics, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Mario I Fernandez
- Department of Urology, Clínica Alemana, Santiago, Chile.,Center for Genetics and Genomics, Faculty of Medicine, Clínica Alemana Universidad del Desarrollo, Santiago, Chile
| | - Cecilia Vial
- Center for Genetics and Genomics, Faculty of Medicine, Clínica Alemana Universidad del Desarrollo, Santiago, Chile
| | | | | | | | - Annemarie Ziegler
- Center for Genetics and Genomics, Faculty of Medicine, Clínica Alemana Universidad del Desarrollo, Santiago, Chile
| | | | - Susana Eyheramendy
- Department of Statistics, Faculty of Mathematics, Pontificia Universidad Católica de Chile, Santiago, Chile.,Faculty of Engineering and Sciences, Universidad Adolfo Ibañez, Peñalolén, Santiago, Chile
| |
Collapse
|
36
|
The Temporal Dynamics of Background Selection in Nonequilibrium Populations. Genetics 2020; 214:1019-1030. [PMID: 32071195 DOI: 10.1534/genetics.119.302892] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Accepted: 01/30/2020] [Indexed: 01/06/2023] Open
Abstract
Neutral genetic diversity across the genome is determined by the complex interplay of mutation, demographic history, and natural selection. While the direct action of natural selection is limited to functional loci across the genome, its impact can have effects on nearby neutral loci due to genetic linkage. These effects of selection at linked sites, referred to as genetic hitchhiking and background selection (BGS), are pervasive across natural populations. However, only recently has there been a focus on the joint consequences of demography and selection at linked sites, and some empirical studies have come to apparently contradictory conclusions as to their combined effects. To understand the relationship between demography and selection at linked sites, we conducted an extensive forward simulation study of BGS under a range of demographic models. We found that the relative levels of diversity in BGS and neutral regions vary over time and that the initial dynamics after a population size change are often in the opposite direction of the long-term expected trajectory. Our detailed observations of the temporal dynamics of neutral diversity in the context of selection at linked sites in nonequilibrium populations provide new intuition about why patterns of diversity under BGS vary through time in natural populations and help reconcile previously contradictory observations. Most notably, our results highlight that classical models of BGS are poorly suited for predicting diversity in nonequilibrium populations.
Collapse
|
37
|
Whiting JR, Fraser BA. Contingent Convergence: The Ability To Detect Convergent Genomic Evolution Is Dependent on Population Size and Migration. G3 (BETHESDA, MD.) 2020; 10:677-693. [PMID: 31871215 PMCID: PMC7003088 DOI: 10.1534/g3.119.400970] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 12/19/2019] [Indexed: 12/02/2022]
Abstract
Outlier scans, in which the genome is scanned for signatures of selection, have become a prominent tool in studies of local adaptation, and more recently studies of genetic convergence in natural populations. However, such methods have the potential to be confounded by features of demographic history, such as population size and migration, which are considerably varied across natural populations. In this study, we use forward-simulations to investigate and illustrate how several measures of genetic differentiation commonly used in outlier scans (FST, DXY and Δπ) are influenced by demographic variation across multiple sampling generations. In a factorial design with 16 treatments, we manipulate the presence/absence of founding bottlenecks (N of founding individuals), prolonged bottlenecks (proportional size of diverging population) and migration rate between two populations with ancestral and diverged phenotypic optima. Our results illustrate known constraints of individual measures associated with reduced population size and a lack of migration; but notably we demonstrate how relationships between measures are similarly dependent on these features of demography. We find that false-positive signals of convergent evolution (the same simulated outliers detected in independent treatments) are attainable as a product of similar population size and migration treatments (particularly for DXY), and that outliers across different measures (for e.g., FST and DXY) can occur with little influence of selection. Taken together, we show how underappreciated, yet quantifiable measures of demographic history can influence commonly employed methods for detecting selection.
Collapse
Affiliation(s)
- James R Whiting
- Department of Biosciences, University of Exeter, Geoffrey Pope Building, Exeter, EX4 4QD
| | - Bonnie A Fraser
- Department of Biosciences, University of Exeter, Geoffrey Pope Building, Exeter, EX4 4QD
| |
Collapse
|
38
|
Moest M, Van Belleghem SM, James JE, Salazar C, Martin SH, Barker SL, Moreira GRP, Mérot C, Joron M, Nadeau NJ, Steiner FM, Jiggins CD. Selective sweeps on novel and introgressed variation shape mimicry loci in a butterfly adaptive radiation. PLoS Biol 2020; 18:e3000597. [PMID: 32027643 PMCID: PMC7029882 DOI: 10.1371/journal.pbio.3000597] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 02/19/2020] [Accepted: 01/15/2020] [Indexed: 11/21/2022] Open
Abstract
Natural selection leaves distinct signatures in the genome that can reveal the targets and history of adaptive evolution. By analysing high-coverage genome sequence data from 4 major colour pattern loci sampled from nearly 600 individuals in 53 populations, we show pervasive selection on wing patterns in the Heliconius adaptive radiation. The strongest signatures correspond to loci with the greatest phenotypic effects, consistent with visual selection by predators, and are found in colour patterns with geographically restricted distributions. These recent sweeps are similar between co-mimics and indicate colour pattern turn-over events despite strong stabilising selection. Using simulations, we compare sweep signatures expected under classic hard sweeps with those resulting from adaptive introgression, an important aspect of mimicry evolution in Heliconius butterflies. Simulated recipient populations show a distinct 'volcano' pattern with peaks of increased genetic diversity around the selected target, characteristic of sweeps of introgressed variation and consistent with diversity patterns found in some populations. Our genomic data reveal a surprisingly dynamic history of colour pattern selection and co-evolution in this adaptive radiation.
Collapse
Affiliation(s)
- Markus Moest
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
- Department of Ecology, University of Innsbruck, Innsbruck, Austria
| | - Steven M. Van Belleghem
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
- Department of Biology, University of Puerto Rico, Rio Piedras, Puerto Rico
| | - Jennifer E. James
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona, United States of America
| | - Camilo Salazar
- Biology Program, Faculty of Natural Sciences and Mathematics, Universidad del Rosario, Bogota D.C., Colombia
| | - Simon H. Martin
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Sarah L. Barker
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - Gilson R. P. Moreira
- Departamento de Zoologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Claire Mérot
- IBIS, Department of Biology, Université Laval, Québec, Canada
| | - Mathieu Joron
- Centre d'Ecologie Fonctionnelle et Evolutive, UMR 5175 CNRS—Université de Montpellier—Université Paul Valéry Montpellier—EPHE, Montpellier, France
| | - Nicola J. Nadeau
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | | | - Chris D. Jiggins
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
39
|
Uricchio LH. Evolutionary perspectives on polygenic selection, missing heritability, and GWAS. Hum Genet 2020; 139:5-21. [PMID: 31201529 PMCID: PMC8059781 DOI: 10.1007/s00439-019-02040-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 06/06/2019] [Indexed: 12/26/2022]
Abstract
Genome-wide association studies (GWAS) have successfully identified many trait-associated variants, but there is still much we do not know about the genetic basis of complex traits. Here, we review recent theoretical and empirical literature regarding selection on complex traits to argue that "missing heritability" is as much an evolutionary problem as it is a statistical problem. We discuss empirical findings that suggest a role for selection in shaping the effect sizes and allele frequencies of causal variation underlying complex traits, and the limitations of these studies. We then use simulations of selection, realistic genome structure, and complex human demography to illustrate the results of recent theoretical work on polygenic selection, and show that statistical inference of causal loci is sharply affected by evolutionary processes. In particular, when selection acts on causal alleles, it hampers the ability to detect causal loci and constrains the transferability of GWAS results across populations. Last, we discuss the implications of these findings for future association studies, and suggest that future statistical methods to infer causal loci for genetic traits will benefit from explicit modeling of the joint distribution of effect sizes and allele frequencies under plausible evolutionary models.
Collapse
Affiliation(s)
- Lawrence H Uricchio
- Department of Biology, Stanford University, Stanford, CA, USA.
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
40
|
Guo J, Zhong J, Li L, Zhong T, Wang L, Song T, Zhang H. Comparative genome analyses reveal the unique genetic composition and selection signals underlying the phenotypic characteristics of three Chinese domestic goat breeds. Genet Sel Evol 2019; 51:70. [PMID: 31771503 PMCID: PMC6880376 DOI: 10.1186/s12711-019-0512-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 11/15/2019] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND As one of the important livestock species around the world, goats provide abundant meat, milk, and fiber to fulfill basic human needs. However, the genetic loci that underlie phenotypic variations in domestic goats are largely unknown, particularly for economically important traits. In this study, we sequenced the whole genome of 38 goats from three Chinese breeds (Chengdu Brown, Jintang Black, and Tibetan Cashmere) and downloaded the genome sequence data of 30 goats from five other breeds (four non-Chinese and one Chinese breed) and 21 Bezoar ibexes to investigate the genetic composition and selection signatures of the Chinese goat breeds after domestication. RESULTS Based on population structure analysis and FST values (average FST = 0.22), the genetic composition of Chengdu Brown goats differs considerably from that of Bezoar ibexes as a result of geographic isolation. Strikingly, the genes under selection that we identified in Tibetan Cashmere goats were significantly enriched in the categories hair growth and bone and nervous system development, possibly because they are involved in adaptation to high-altitude. In particular, we found a large difference in allele frequency of one novel SNP (c.-253G>A) in the 5'-UTR of FGF5 between Cashmere goats and goat breeds with short hair. The mutation at this site introduces a start codon that results in the occurrence of a premature FGF5 protein and is likely a natural causal variant that is involved in the long hair phenotype of cashmere goats. The haplotype tagged with the AGG-allele in exon 12 of DSG3, which encodes a cell adhesion molecule that is expressed mainly in the skin, was almost fixed in Tibetan Cashmere goats, whereas this locus still segregates in the lowland goat breeds. The pigmentation gene KITLG showed a strong signature of selection in Tibetan Cashmere goats. The genes ASIP and LCORL were identified as being under positive selection in Jintang Black goats. CONCLUSIONS After domestication, geographic isolation of some goat breeds has resulted in distinct genetic structures. Furthermore, our work highlights several positively selected genes that likely contributed to breed-related traits in domestic goats.
Collapse
Affiliation(s)
- Jiazhong Guo
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130 China
| | - Jie Zhong
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130 China
| | - Li Li
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130 China
| | - Tao Zhong
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130 China
| | - Linjie Wang
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130 China
| | - Tianzeng Song
- Institute of Animal Science, Tibet Academy of Agricultural and Animal Husbandry Sciences, Lhasa, 850009 China
| | - Hongping Zhang
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130 China
| |
Collapse
|
41
|
Smith CCR, Flaxman SM. Leveraging whole genome sequencing data for demographic inference with approximate Bayesian computation. Mol Ecol Resour 2019; 20:125-139. [DOI: 10.1111/1755-0998.13092] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 08/30/2019] [Accepted: 09/06/2019] [Indexed: 01/16/2023]
Affiliation(s)
- Chris C. R. Smith
- Department of Ecology and Evolutionary Biology University of Colorado Boulder CO USA
| | - Samuel M. Flaxman
- Department of Ecology and Evolutionary Biology University of Colorado Boulder CO USA
| |
Collapse
|
42
|
Matthey‐Doret R, Whitlock MC. Background selection andFST: Consequences for detecting local adaptation. Mol Ecol 2019; 28:3902-3914. [DOI: 10.1111/mec.15197] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Revised: 06/19/2019] [Accepted: 07/03/2019] [Indexed: 01/03/2023]
Affiliation(s)
- Remi Matthey‐Doret
- Department of Zoology and Biodiversity Research Centre University of British Columbia Vancouver BC Canada
| | - Michael C. Whitlock
- Department of Zoology and Biodiversity Research Centre University of British Columbia Vancouver BC Canada
| |
Collapse
|
43
|
Stern AJ, Wilton PR, Nielsen R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet 2019; 15:e1008384. [PMID: 31518343 PMCID: PMC6760815 DOI: 10.1371/journal.pgen.1008384] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 09/25/2019] [Accepted: 08/26/2019] [Indexed: 12/24/2022] Open
Abstract
Most current methods for detecting natural selection from DNA sequence data are limited in that they are either based on summary statistics or a composite likelihood, and as a consequence, do not make full use of the information available in DNA sequence data. We here present a new importance sampling approach for approximating the full likelihood function for the selection coefficient. Our method CLUES treats the ancestral recombination graph (ARG) as a latent variable that is integrated out using previously published Markov Chain Monte Carlo (MCMC) methods. The method can be used for detecting selection, estimating selection coefficients, testing models of changes in the strength of selection, estimating the time of the start of a selective sweep, and for inferring the allele frequency trajectory of a selected or neutral allele. We perform extensive simulations to evaluate the method and show that it uniformly improves power to detect selection compared to current popular methods such as nSL and SDS, and can provide reliable inferences of allele frequency trajectories under many conditions. We also explore the potential of our method to detect extremely recent changes in the strength of selection. We use the method to infer the past allele frequency trajectory for a lactase persistence SNP (MCM6) in Europeans. We also infer the trajectory of a SNP (EDAR) in Han Chinese, finding evidence that this allele's age is much older than previously claimed. We also study a set of 11 pigmentation-associated variants. Several genes show evidence of strong selection particularly within the last 5,000 years, including ASIP, KITLG, and TYR. However, selection on OCA2/HERC2 seems to be much older and, in contrast to previous claims, we find no evidence of selection on TYRP1.
Collapse
Affiliation(s)
- Aaron J. Stern
- Graduate Group in Computation Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Peter R. Wilton
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
| |
Collapse
|
44
|
Fraser BA, Whiting JR. What can be learned by scanning the genome for molecular convergence in wild populations? Ann N Y Acad Sci 2019; 1476:23-42. [PMID: 31241191 PMCID: PMC7586825 DOI: 10.1111/nyas.14177] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 05/24/2019] [Accepted: 06/04/2019] [Indexed: 12/11/2022]
Abstract
Convergent evolution, where independent lineages evolve similar phenotypes in response to similar challenges, can provide valuable insight into how selection operates and the limitations it encounters. However, it has only recently become possible to explore how convergent evolution is reflected at the genomic level. The overlapping outlier approach (OOA), where genome scans of multiple independent lineages are used to find outliers that overlap and therefore identify convergently evolving loci, is becoming popular. Here, we present a quantitative analysis of 34 studies that used this approach across many sampling designs, taxa, and sampling intensities. We found that OOA studies with increased biological sampling power within replicates have increased likelihood of finding overlapping, "convergent" signals of adaptation between them. When identifying convergent loci as overlapping outliers, it is tempting to assume that any false-positive outliers derived from individual scans will fail to overlap across replicates, but this cannot be guaranteed. We highlight how population demographics and genomic context can contribute toward both true convergence and false positives in OOA studies. We finish with an exploration of emerging methods that couple genome scans with phenotype and environmental measures, leveraging added information from genome data to more directly test hypotheses of the likelihood of convergent evolution.
Collapse
Affiliation(s)
- Bonnie A Fraser
- Department of Biosciences, University of Exeter, Exeter, United Kingdom
| | - James R Whiting
- Department of Biosciences, University of Exeter, Exeter, United Kingdom
| |
Collapse
|
45
|
Exploiting selection at linked sites to infer the rate and strength of adaptation. Nat Ecol Evol 2019; 3:977-984. [PMID: 31061475 PMCID: PMC6693860 DOI: 10.1038/s41559-019-0890-6] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 03/28/2019] [Indexed: 12/18/2022]
Abstract
Genomic data encodes past evolutionary events and has the potential to reveal the strength, rate, and biological drivers of adaptation. However, jointly estimating adaptation rate (a) and adaptation strength remains challenging because evolutionary processes such as demography, linkage, and non-neutral polymorphism can confound inference. Here, we exploit the influence of background selection to reduce the fixation rate of weakly-beneficial alleles to jointly infer the strength and rate of adaptation. We develop an MK-based method (ABC-MK) to infer adaptation rate and strength, and estimate α = 0.135 in human protein-coding sequences, 72% of which is contributed by weakly-adaptive variants. We show that in this adaptation regime α is reduced ≈ 25% by linkage genome-wide. Moreover, we show that virus-interacting proteins (VIPs) undergo adaptation that is both stronger and nearly twice as frequent as the genome average (α = 0.224, 56% due to strongly-beneficial alleles). Our results suggest that while most adaptation in human proteins is weakly-beneficial, adaptation to viruses is often strongly-beneficial. Our method provides a robust framework for estimating adaptation rate and strength across species.
Collapse
|
46
|
Torres R, Szpiech ZA, Hernandez RD. Correction: Human demographic history has amplified the effects of background selection across the genome. PLoS Genet 2019; 15:e1007898. [PMID: 30601801 PMCID: PMC6314599 DOI: 10.1371/journal.pgen.1007898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
47
|
High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat Genet 2018; 50:1311-1317. [PMID: 30104759 PMCID: PMC6145075 DOI: 10.1038/s41588-018-0177-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 06/21/2018] [Indexed: 12/19/2022]
Abstract
Interest in reconstructing demographic histories has motivated the development of methods to estimate locus-specific pairwise coalescence times from whole-genome sequence data. Here we introduce a powerful new method, ASMC, that can estimate coalescence times using only SNP array data, and is orders of magnitude faster than previous approaches. We applied ASMC to detect recent positive selection in 113,851 phased British samples from the UK Biobank, and detected 12 genome-wide significant signals, including 6 novel loci. We also applied ASMC to sequencing data from 498 Dutch individuals to detect background selection at deeper time scales. We detected strong heritability enrichment in regions of high background selection in an analysis of 20 independent diseases and complex traits using stratified LD score regression, conditioned on a broad set of functional annotations (including other background selection annotations). These results underscore the widespread effects of background selection on the genetic architecture of complex traits.
Collapse
|