51
|
Fujito NT, Satta Y, Hayakawa T, Takahata N. A new inference method for detecting an ongoing selective sweep. Genes Genet Syst 2018; 93:149-161. [DOI: 10.1266/ggs.18-00008] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Naoko T. Fujito
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Yoko Satta
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Toshiyuki Hayakawa
- Graduate School of Systems Life Sciences, Kyushu University
- Faculty of Arts and Science, Kyushu University
| | - Naoyuki Takahata
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| |
Collapse
|
52
|
Stobdan T, Akbari A, Azad P, Zhou D, Poulsen O, Appenzeller O, Gonzales GF, Telenti A, Wong EHM, Saini S, Kirkness EF, Venter JC, Bafna V, Haddad GG. New Insights into the Genetic Basis of Monge's Disease and Adaptation to High-Altitude. Mol Biol Evol 2018; 34:3154-3168. [PMID: 29029226 DOI: 10.1093/molbev/msx239] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Human high-altitude (HA) adaptation or mal-adaptation is explored to understand the physiology, pathophysiology, and molecular mechanisms that underlie long-term exposure to hypoxia. Here, we report the results of an analysis of the largest whole-genome-sequencing of Chronic Mountain Sickness (CMS) and nonCMS individuals, identified candidate genes and functionally validated these candidates in a genetic model system (Drosophila). We used PreCIOSS algorithm that uses Haplotype Allele Frequency score to separate haplotypes carrying the favored allele from the noncarriers and accordingly, prioritize genes associated with the CMS or nonCMS phenotype. Haplotypes in eleven candidate regions, with SNPs mostly in nonexonic regions, were significantly different between CMS and nonCMS subjects. Closer examination of individual genes in these regions revealed the involvement of previously identified candidates (e.g., SENP1) and also unreported ones SGK3, COPS5, PRDM1, and IFT122 in CMS. Remarkably, in addition to genes like SENP1, SGK3, and COPS5 which are HIF-dependent, our study reveals for the first time HIF-independent gene PRDM1, indicating an involvement of wider, nonHIF pathways in HA adaptation. Finally, we observed that down-regulating orthologs of these genes in Drosophila significantly enhanced their hypoxia tolerance. Taken together, the PreCIOSS algorithm, applied on a large number of genomes, identifies the involvement of both new and previously reported genes in selection sweeps, highlighting the involvement of multiple hypoxia response systems. Since the overwhelming majority of SNPs are in nonexonic (and possibly regulatory) regions, we speculate that adaptation to HA necessitates greater genetic flexibility allowing for transcript variability in response to graded levels of hypoxia.
Collapse
Affiliation(s)
- Tsering Stobdan
- Division of Respiratory Medicine, Department of Pediatrics, University of California, San Diego, La Jolla, CA
| | - Ali Akbari
- Department of Electrical & Computer Engineering, University of California, San Diego, La Jolla, CA
| | - Priti Azad
- Division of Respiratory Medicine, Department of Pediatrics, University of California, San Diego, La Jolla, CA
| | - Dan Zhou
- Division of Respiratory Medicine, Department of Pediatrics, University of California, San Diego, La Jolla, CA
| | - Orit Poulsen
- Division of Respiratory Medicine, Department of Pediatrics, University of California, San Diego, La Jolla, CA
| | - Otto Appenzeller
- Department of Neurology, New Mexico Health Enhancement and Marathon Clinics Research Foundation, Albuquerque, NM
| | - Gustavo F Gonzales
- High Altitude Research Institute and Department of Biological and Physiological Sciences, Faculty of Sciences and Philosophy, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Amalio Telenti
- Human Longevity Inc., San Diego, CA.,J. Craig Venter Institute, La Jolla, CA
| | | | - Shubham Saini
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA
| | | | - J Craig Venter
- Human Longevity Inc., San Diego, CA.,J. Craig Venter Institute, La Jolla, CA
| | - Vineet Bafna
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA
| | - Gabriel G Haddad
- Division of Respiratory Medicine, Department of Pediatrics, University of California, San Diego, La Jolla, CA.,Department of Neurosciences, University of California, San Diego, La Jolla, CA.,Rady Children's Hospital, San Diego, CA
| |
Collapse
|
53
|
Gory JJ, Herbei R, Kubatko LS. Bayesian inference of selection in the Wright-Fisher diffusion model. Stat Appl Genet Mol Biol 2018; 17:sagmb-2017-0046. [PMID: 29874197 DOI: 10.1515/sagmb-2017-0046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The increasing availability of population-level allele frequency data across one or more related populations necessitates the development of methods that can efficiently estimate population genetics parameters, such as the strength of selection acting on the population(s), from such data. Existing methods for this problem in the setting of the Wright-Fisher diffusion model are primarily likelihood-based, and rely on numerical approximation for likelihood computation and on bootstrapping for assessment of variability in the resulting estimates, requiring extensive computation. Recent work has provided a method for obtaining exact samples from general Wright-Fisher diffusion processes, enabling the development of methods for Bayesian estimation in this setting. We develop and implement a Bayesian method for estimating the strength of selection based on the Wright-Fisher diffusion for data sampled at a single time point. The method utilizes the latest algorithms for exact sampling to devise a Markov chain Monte Carlo procedure to draw samples from the joint posterior distribution of the selection coefficient and the allele frequencies. We demonstrate that when assumptions about the initial allele frequencies are accurate the method performs well for both simulated data and for an empirical data set on hypoxia in flies, where we find evidence for strong positive selection in a region of chromosome 2L previously identified. We discuss possible extensions of our method to the more general settings commonly encountered in practice, highlighting the advantages of Bayesian approaches to inference in this setting.
Collapse
Affiliation(s)
- Jeffrey J Gory
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
| | - Radu Herbei
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
| | - Laura S Kubatko
- Departments of Statistics and Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
54
|
Kern AD, Schrider DR. diploS/HIC: An Updated Approach to Classifying Selective Sweeps. G3 (BETHESDA, MD.) 2018; 8:1959-1970. [PMID: 29626082 PMCID: PMC5982824 DOI: 10.1534/g3.118.200262] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 04/04/2018] [Indexed: 11/18/2022]
Abstract
Identifying selective sweeps in populations that have complex demographic histories remains a difficult problem in population genetics. We previously introduced a supervised machine learning approach, S/HIC, for finding both hard and soft selective sweeps in genomes on the basis of patterns of genetic variation surrounding a window of the genome. While S/HIC was shown to be both powerful and precise, the utility of S/HIC was limited by the use of phased genomic data as input. In this report we describe a deep learning variant of our method, diploS/HIC, that uses unphased genotypes to accurately classify genomic windows. diploS/HIC is shown to be quite powerful even at moderate to small sample sizes.
Collapse
Affiliation(s)
- Andrew D Kern
- Department of Genetics, Rutgers University, Piscataway, NJ 08854
| | | |
Collapse
|
55
|
Weigand H, Leese F. Detecting signatures of positive selection in non-model species using genomic data. Zool J Linn Soc 2018. [DOI: 10.1093/zoolinnean/zly007] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Affiliation(s)
- Hannah Weigand
- Aquatic Ecosystem Research, University of Duisburg-Essen, Universitätsstraße, Essen, Germany
| | - Florian Leese
- Aquatic Ecosystem Research, University of Duisburg-Essen, Universitätsstraße, Essen, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstraße, Essen, Germany
| |
Collapse
|
56
|
Schrider DR, Kern AD. Supervised Machine Learning for Population Genetics: A New Paradigm. Trends Genet 2018; 34:301-312. [PMID: 29331490 PMCID: PMC5905713 DOI: 10.1016/j.tig.2017.12.005] [Citation(s) in RCA: 220] [Impact Index Per Article: 31.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 11/29/2017] [Accepted: 12/08/2017] [Indexed: 01/21/2023]
Abstract
As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning (ML). We review the fundamentals of ML, discuss recent applications of supervised ML to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised ML is an important and underutilized tool that has considerable potential for the world of evolutionary genomics.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08554, USA.
| | - Andrew D Kern
- Department of Genetics, and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08554, USA.
| |
Collapse
|
57
|
Schrider DR, Ayroles J, Matute DR, Kern AD. Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia. PLoS Genet 2018; 14:e1007341. [PMID: 29684059 PMCID: PMC5933812 DOI: 10.1371/journal.pgen.1007341] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 05/03/2018] [Accepted: 03/28/2018] [Indexed: 12/30/2022] Open
Abstract
Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia.
Collapse
Affiliation(s)
- Daniel R. Schrider
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of America
| | - Julien Ayroles
- Ecology and Evolutionary Biology Department, Princeton University, Princeton, New Jersey, United States of America
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Daniel R. Matute
- Biology Department, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Andrew D. Kern
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of America
| |
Collapse
|
58
|
Akbari A, Vitti JJ, Iranmehr A, Bakhtiari M, Sabeti PC, Mirarab S, Bafna V. Identifying the favored mutation in a positive selective sweep. Nat Methods 2018; 15:279-282. [PMID: 29457793 PMCID: PMC6231406 DOI: 10.1038/nmeth.4606] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 01/08/2018] [Indexed: 01/23/2023]
Abstract
Most approaches that capture signatures of selective sweeps in population genomics data do not identify the specific mutation favored by selection. We present iSAFE (for "integrated selection of allele favored by evolution"), a method that enables researchers to accurately pinpoint the favored mutation in a large region (∼5 Mbp) by using a statistic derived solely from population genetics signals. iSAFE does not require knowledge of demography, the phenotype under selection, or functional annotations of mutations.
Collapse
Affiliation(s)
- Ali Akbari
- Department of Electrical & Computer Engineering, University of California San Diego, La Jolla, California, USA
| | - Joseph J Vitti
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Arya Iranmehr
- Department of Electrical & Computer Engineering, University of California San Diego, La Jolla, California, USA
| | - Mehrdad Bakhtiari
- Department of Computer Science & Engineering, University of California San Diego, La Jolla, California, USA
| | - Pardis C Sabeti
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Siavash Mirarab
- Department of Electrical & Computer Engineering, University of California San Diego, La Jolla, California, USA
| | - Vineet Bafna
- Department of Computer Science & Engineering, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
59
|
Pardiñas AF, Holmans P, Pocklington AJ, Escott-Price V, Ripke S, Carrera N, Legge SE, Bishop S, Cameron D, Hamshere ML, Han J, Hubbard L, Lynham A, Mantripragada K, Rees E, MacCabe JH, McCarroll SA, Baune BT, Breen G, Byrne EM, Dannlowski U, Eley TC, Hayward C, Martin NG, McIntosh AM, Plomin R, Porteous DJ, Wray NR, Caballero A, Geschwind DH, Huckins LM, Ruderfer DM, Santiago E, Sklar P, Stahl EA, Won H, Agerbo E, Als TD, Andreassen OA, Bækvad-Hansen M, Mortensen PB, Pedersen CB, Børglum AD, Bybjerg-Grauholm J, Djurovic S, Durmishi N, Pedersen MG, Golimbet V, Grove J, Hougaard DM, Mattheisen M, Molden E, Mors O, Nordentoft M, Pejovic-Milovancevic M, Sigurdsson E, Silagadze T, Hansen CS, Stefansson K, Stefansson H, Steinberg S, Tosato S, Werge T, Collier DA, Rujescu D, Kirov G, Owen MJ, O'Donovan MC, Walters JTR. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat Genet 2018; 50:381-389. [PMID: 29483656 PMCID: PMC5918692 DOI: 10.1038/s41588-018-0059-2] [Citation(s) in RCA: 1004] [Impact Index Per Article: 143.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 01/07/2018] [Indexed: 12/13/2022]
Abstract
Schizophrenia is a debilitating psychiatric condition often associated with poor quality of life and decreased life expectancy. Lack of progress in improving treatment outcomes has been attributed to limited knowledge of the underlying biology, although large-scale genomic studies have begun to provide insights. We report a new genome-wide association study of schizophrenia (11,260 cases and 24,542 controls), and through meta-analysis with existing data we identify 50 novel associated loci and 145 loci in total. Through integrating genomic fine-mapping with brain expression and chromosome conformation data, we identify candidate causal genes within 33 loci. We also show for the first time that the common variant association signal is highly enriched among genes that are under strong selective pressures. These findings provide new insights into the biology and genetic architecture of schizophrenia, highlight the importance of mutation-intolerant genes and suggest a mechanism by which common risk variants persist in the population.
Collapse
Affiliation(s)
- Antonio F Pardiñas
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Peter Holmans
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Andrew J Pocklington
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Valentina Escott-Price
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Stephan Ripke
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry and Psychotherapy, Charité, Campus Mitte, Berlin, Germany
| | - Noa Carrera
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Sophie E Legge
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Sophie Bishop
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Darren Cameron
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Marian L Hamshere
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Jun Han
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Leon Hubbard
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Amy Lynham
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Kiran Mantripragada
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Elliott Rees
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - James H MacCabe
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Steven A McCarroll
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Bernhard T Baune
- Discipline of Psychiatry, University of Adelaide, Adelaide, South Australia, Australia
| | - Gerome Breen
- MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- NIHR Biomedical Research Centre for Mental Health, Maudsley Hospital and Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Enda M Byrne
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Udo Dannlowski
- Department of Psychiatry and Psychotherapy, University of Münster, Münster, Germany
| | - Thalia C Eley
- MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Caroline Hayward
- Medical Genetics Section, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Nicholas G Martin
- School of Psychology, University of Queensland, Brisbane, Queensland, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Andrew M McIntosh
- Division of Psychiatry, University of Edinburgh, Edinburgh, UK
- Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK
| | - Robert Plomin
- MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - David J Porteous
- Medical Genetics Section, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Naomi R Wray
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Armando Caballero
- Departamento de Bioquímica, Genética e Inmunología. Facultad de Biología, Universidad de Vigo, Vigo, Spain
| | - Daniel H Geschwind
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Laura M Huckins
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Douglas M Ruderfer
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Enrique Santiago
- Departamento de Biología Funcional. Facultad de Biología, Universidad de Oviedo, Oviedo, Spain
| | - Pamela Sklar
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eli A Stahl
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Hyejung Won
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Esben Agerbo
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Thomas D Als
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- iSEQ, Center for Integrative Sequencing, Aarhus University, Aarhus, Denmark
- Department of Biomedicine-Human Genetics, Aarhus University, Aarhus, Denmark
| | - Ole A Andreassen
- Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
| | - Marie Bækvad-Hansen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Preben Bo Mortensen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- iSEQ, Center for Integrative Sequencing, Aarhus University, Aarhus, Denmark
| | - Carsten Bøcker Pedersen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Anders D Børglum
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- iSEQ, Center for Integrative Sequencing, Aarhus University, Aarhus, Denmark
- Department of Biomedicine-Human Genetics, Aarhus University, Aarhus, Denmark
| | - Jonas Bybjerg-Grauholm
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Srdjan Djurovic
- NORMENT, KG Jebsen Centre for Psychosis Research, Department of Clinical Science, University of Bergen, Bergen, Norway
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
| | - Naser Durmishi
- Department of Child and Adolescent Psychiatry, University Clinic of Psychiatry, Skopje, Macedonia
| | - Marianne Giørtz Pedersen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Vera Golimbet
- Department of Clinical Genetics, Mental Health Research Center, Moscow, Russia
| | - Jakob Grove
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- iSEQ, Center for Integrative Sequencing, Aarhus University, Aarhus, Denmark
- Department of Biomedicine-Human Genetics, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - David M Hougaard
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Manuel Mattheisen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- iSEQ, Center for Integrative Sequencing, Aarhus University, Aarhus, Denmark
- Department of Biomedicine-Human Genetics, Aarhus University, Aarhus, Denmark
| | - Espen Molden
- Center for Psychopharmacology, Diakonhjemmet Hospital, Oslo, Norway
| | - Ole Mors
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Psychosis Research Unit, Aarhus University Hospital, Risskov, Denmark
| | - Merete Nordentoft
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Mental Health Services in the Capital Region of Denmark, Mental Health Center Copenhagen, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Teimuraz Silagadze
- Department of Psychiatry and Drug Addiction, Tbilisi State Medical University (TSMU), Tbilisi, Georgia
| | - Christine Søholm Hansen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | | | | | | | - Sarah Tosato
- Section of Psychiatry, Department of Public Health and Community Medicine, University of Verona, Verona, Italy
| | - Thomas Werge
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Institute of Biological Psychiatry, MHC Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - David A Collier
- MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- Discovery Neuroscience Research, Eli Lilly and Company, Lilly Research Laboratories, Windlesham, UK
| | - Dan Rujescu
- Department of Psychiatry, University of Halle, Halle, Germany
- Department of Psychiatry, University of Munich, Munich, Germany
| | - George Kirov
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Michael J Owen
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK.
| | - Michael C O'Donovan
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK.
| | - James T R Walters
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK.
| |
Collapse
|
60
|
Sugden LA, Atkinson EG, Fischer AP, Rong S, Henn BM, Ramachandran S. Localization of adaptive variants in human genomes using averaged one-dependence estimation. Nat Commun 2018; 9:703. [PMID: 29459739 PMCID: PMC5818606 DOI: 10.1038/s41467-018-03100-7] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 01/19/2018] [Indexed: 12/19/2022] Open
Abstract
Statistical methods for identifying adaptive mutations from population genetic data face several obstacles: assessing the significance of genomic outliers, integrating correlated measures of selection into one analytic framework, and distinguishing adaptive variants from hitchhiking neutral variants. Here, we introduce SWIF(r), a probabilistic method that detects selective sweeps by learning the distributions of multiple selection statistics under different evolutionary scenarios and calculating the posterior probability of a sweep at each genomic site. SWIF(r) is trained using simulations from a user-specified demographic model and explicitly models the joint distributions of selection statistics, thereby increasing its power to both identify regions undergoing sweeps and localize adaptive mutations. Using array and exome data from 45 ‡Khomani San hunter-gatherers of southern Africa, we identify an enrichment of adaptive signals in genes associated with metabolism and obesity. SWIF(r) provides a transparent probabilistic framework for localizing beneficial mutations that is extensible to a variety of evolutionary scenarios.
Collapse
Affiliation(s)
- Lauren Alpert Sugden
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA.
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, 02912, USA.
| | - Elizabeth G Atkinson
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Annie P Fischer
- Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA
| | - Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02912, USA
| | - Brenna M Henn
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA.
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, 02912, USA.
| |
Collapse
|
61
|
Khromov P, Malliaris CD, Morozov AV. Generalization of the Ewens sampling formula to arbitrary fitness landscapes. PLoS One 2018; 13:e0190186. [PMID: 29324850 PMCID: PMC5764269 DOI: 10.1371/journal.pone.0190186] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 12/08/2017] [Indexed: 11/30/2022] Open
Abstract
In considering evolution of transcribed regions, regulatory sequences, and other genomic loci, we are often faced with a situation in which the number of allelic states greatly exceeds the size of the population. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from the population, do not change with time. In the absence of selection, the probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary fitness distributions. Although our approach is general, we focus on the class of fitness landscapes, inspired by recent high-throughput genotype-phenotype maps, in which alleles can be in several distinct phenotypic states. This class of landscapes yields sampling probabilities that are computationally more tractable and can form a basis for inference of selection signatures from genomic data. Using an efficient numerical implementation of the sampling probabilities, we demonstrate that, for a sizable range of mutation rates and selection coefficients, the steady-state allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other evolutionary parameters from population data. We also carry out numerical simulations to challenge various approximations involved in deriving our sampling formulas, such as the infinite-allele limit and the “full connectivity” assumption inherent in the Ewens theory, in which each allele can mutate into any other allele. We find that, at least for the specific numerical examples studied, our theory remains sufficiently accurate even if these assumptions are relaxed. Thus our framework establishes both theoretical and practical foundations for inferring selection signatures from population-level genomic sequence samples.
Collapse
Affiliation(s)
- Pavel Khromov
- Department of Physics and Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
| | - Constantin D. Malliaris
- Department of Physics and Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
| | - Alexandre V. Morozov
- Department of Physics and Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
62
|
Baharian S, Gravel S. On the decidability of population size histories from finite allele frequency spectra. Theor Popul Biol 2018; 120:42-51. [PMID: 29305873 DOI: 10.1016/j.tpb.2017.12.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 12/15/2017] [Accepted: 12/20/2017] [Indexed: 10/18/2022]
Abstract
Understanding the historical events that shaped current genomic diversity has applications in historical, biological, and medical research. However, the amount of historical information that can be inferred from genetic data is finite, which leads to an identifiability problem. For example, different historical processes can lead to identical distribution of allele frequencies. This identifiability issue casts a shadow of uncertainty over the results of any study which uses the frequency spectrum to infer past demography. It has been argued that imposing mild 'reasonableness' constraints on demographic histories can enable unique reconstruction, at least in an idealized setting where the length of the genome is nearly infinite. Here, we discuss this problem for finite sample size and genome length. Using the diffusion approximation, we obtain bounds on likelihood differences between similar demographic histories, and use them to construct pairs of very different reasonable histories that produce almost-identical frequency distributions. The finite-genome problem therefore remains poorly determined even among reasonable histories. Where fits to few-parameter models produce narrow parameter confidence intervals, large uncertainties lurk hidden by model assumption.
Collapse
Affiliation(s)
- Soheil Baharian
- Department of Human Genetics, McGill University, Montreal, QC, Canada; McGill University and Genome Quebec Innovation Centre, Montreal, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada; McGill University and Genome Quebec Innovation Centre, Montreal, QC, Canada.
| |
Collapse
|
63
|
Detecting Recent Positive Selection with a Single Locus Test Bipartitioning the Coalescent Tree. Genetics 2017; 208:791-805. [PMID: 29217523 DOI: 10.1534/genetics.117.300401] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Accepted: 12/01/2017] [Indexed: 01/09/2023] Open
Abstract
Many population genomic studies have been conducted in the past to search for traces of recent events of positive selection. These traces, however, can be obscured by temporal variation of population size or other demographic factors. To reduce the confounding impact of demography, the coalescent tree topology has been used as an additional source of information for detecting recent positive selection in a population or a species. Based on the branching pattern at the root, we partition the hypothetical coalescent tree, inferred from a sequence sample, into two subtrees. The reasoning is that positive selection could impose a strong impact on branch length in one of the two subtrees while demography has the same effect on average on both subtrees. Thus, positive selection should be detectable by comparing statistics calculated for the two subtrees. Simulations demonstrate that the proposed test based on these principles has high power to detect recent positive selection even when DNA polymorphism data from only one locus is available, and that it is robust to the confounding effect of demography. One feature is that all components in the summary statistics ([Formula: see text]) can be computed analytically. Moreover, misinference of derived and ancestral alleles is seen to have only a limited effect on the test, and it therefore avoids a notorious problem when searching for traces of recent positive selection.
Collapse
|
64
|
Fuentes-Pardo AP, Ruzzante DE. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations. Mol Ecol 2017; 26:5369-5406. [PMID: 28746784 DOI: 10.1111/mec.14264] [Citation(s) in RCA: 160] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 06/23/2017] [Accepted: 06/28/2017] [Indexed: 12/14/2022]
Abstract
Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology.
Collapse
|
65
|
Abstract
The degree to which adaptation in recent human evolution shapes genetic variation remains controversial. This is in part due to the limited evidence in humans for classic "hard selective sweeps", wherein a novel beneficial mutation rapidly sweeps through a population to fixation. However, positive selection may often proceed via "soft sweeps" acting on mutations already present within a population. Here, we examine recent positive selection across six human populations using a powerful machine learning approach that is sensitive to both hard and soft sweeps. We found evidence that soft sweeps are widespread and account for the vast majority of recent human adaptation. Surprisingly, our results also suggest that linked positive selection affects patterns of variation across much of the genome, and may increase the frequencies of deleterious mutations. Our results also reveal insights into the role of sexual selection, cancer risk, and central nervous system development in recent human evolution.
Collapse
Affiliation(s)
- Daniel R. Schrider
- Department of Genetics, Rutgers University, Piscataway, NJ
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| | - Andrew D. Kern
- Department of Genetics, Rutgers University, Piscataway, NJ
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| |
Collapse
|
66
|
Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps. Genetics 2017; 203:1807-25. [PMID: 27516617 DOI: 10.1534/genetics.115.185900] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 04/05/2016] [Indexed: 12/12/2022] Open
Abstract
During a selective sweep, characteristic patterns of linkage disequilibrium can arise in the genomic region surrounding a selected locus. These have been used to infer past selective sweeps. However, the recombination rate is known to vary substantially along the genome for many species. We here investigate the effectiveness of current (Kelly's [Formula: see text] and [Formula: see text]) and novel statistics at inferring hard selective sweeps based on linkage disequilibrium distortions under different conditions, including a human-realistic demographic model and recombination rate variation. When the recombination rate is constant, Kelly's [Formula: see text] offers high power, but is outperformed by a novel statistic that we test, which we call [Formula: see text] We also find this statistic to be effective at detecting sweeps from standing variation. When recombination rate fluctuations are included, there is a considerable reduction in power for all linkage disequilibrium-based statistics. However, this can largely be reversed by appropriately controlling for expected linkage disequilibrium using a genetic map. To further test these different methods, we perform selection scans on well-characterized HapMap data, finding that all three statistics-[Formula: see text] Kelly's [Formula: see text] and [Formula: see text]-are able to replicate signals at regions previously identified as selection candidates based on population differentiation or the site frequency spectrum. While [Formula: see text] replicates most candidates when recombination map data are not available, the [Formula: see text] and [Formula: see text] statistics are more successful when recombination rate variation is controlled for. Given both this and their higher power in simulations of selective sweeps, these statistics are preferred when information on local recombination rate variation is available.
Collapse
|
67
|
Abstract
Molecular population genetics aims to explain genetic variation and molecular evolution from population genetics principles. The field was born 50 years ago with the first measures of genetic variation in allozyme loci, continued with the nucleotide sequencing era, and is currently in the era of population genomics. During this period, molecular population genetics has been revolutionized by progress in data acquisition and theoretical developments. The conceptual elegance of the neutral theory of molecular evolution or the footprint carved by natural selection on the patterns of genetic variation are two examples of the vast number of inspiring findings of population genetics research. Since the inception of the field, Drosophila has been the prominent model species: molecular variation in populations was first described in Drosophila and most of the population genetics hypotheses were tested in Drosophila species. In this review, we describe the main concepts, methods, and landmarks of molecular population genetics, using the Drosophila model as a reference. We describe the different genetic data sets made available by advances in molecular technologies, and the theoretical developments fostered by these data. Finally, we review the results and new insights provided by the population genomics approach, and conclude by enumerating challenges and new lines of inquiry posed by increasingly large population scale sequence data.
Collapse
|
68
|
He Q, Knowles LL. Identifying targets of selection in mosaic genomes with machine learning: applications inAnopheles gambiaefor detecting sites within locally adapted chromosomal inversions. Mol Ecol 2016; 25:2226-43. [DOI: 10.1111/mec.13619] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Revised: 03/01/2016] [Accepted: 03/08/2016] [Indexed: 01/25/2023]
Affiliation(s)
- Qixin He
- Department of Ecology & Evolutionary Biology, Museum of Zoology; University of Michigan; 1109 Geddes Ave. Ann Arbor MI 48109-1079 USA
| | - L. Lacey Knowles
- Department of Ecology & Evolutionary Biology, Museum of Zoology; University of Michigan; 1109 Geddes Ave. Ann Arbor MI 48109-1079 USA
| |
Collapse
|
69
|
Schrider DR, Kern AD. S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning. PLoS Genet 2016; 12:e1005928. [PMID: 26977894 PMCID: PMC4792382 DOI: 10.1371/journal.pgen.1005928] [Citation(s) in RCA: 119] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 02/21/2016] [Indexed: 12/17/2022] Open
Abstract
Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover, we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus, even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally, we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample, and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.
Collapse
Affiliation(s)
- Daniel R. Schrider
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
- * E-mail:
| | - Andrew D. Kern
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of America
| |
Collapse
|
70
|
Sheehan S, Song YS. Deep Learning for Population Genetic Inference. PLoS Comput Biol 2016; 12:e1004845. [PMID: 27018908 PMCID: PMC4809617 DOI: 10.1371/journal.pcbi.1004845] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 03/02/2016] [Indexed: 02/05/2023] Open
Abstract
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Collapse
Affiliation(s)
- Sara Sheehan
- Department of Computer Science, Smith College, Northampton, Massachusetts, United States of America
- Computer Science Division, UC Berkeley, Berkeley, California, United States of America
| | - Yun S. Song
- Computer Science Division, UC Berkeley, Berkeley, California, United States of America
- Department of Statistics, UC Berkeley, Berkeley, California, United States of America
- Department of Integrative Biology, UC Berkeley, Berkeley, California, United States of America
- Department of Mathematics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
71
|
Schrider DR, Kern AD. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain. Genome Biol Evol 2015; 7:3511-28. [PMID: 26590212 PMCID: PMC4700959 DOI: 10.1093/gbe/evv228] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods.
Collapse
Affiliation(s)
| | - Andrew D Kern
- Department of Genetics, Rutgers University, Piscataway Human Genetics Institute of New Jersey, Piscataway, New Jersey
| |
Collapse
|
72
|
Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V. Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele. PLoS Genet 2015; 11:e1005527. [PMID: 26402243 PMCID: PMC4581834 DOI: 10.1371/journal.pgen.1005527] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 08/24/2015] [Indexed: 11/19/2022] Open
Abstract
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory—for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations. Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying genomic regions under positive selection. However, methods that detect positive selective sweeps do not typically identify the favored allele, or even the haplotypes carrying the favored allele. The main contribution of this paper is the development and analysis of a new statistic (the HAF score), assigned to individual haplotypes. Using both theoretical analyses and simulations, we describe how the HAF scores differ for carriers and non-carriers of the favored allele, and how they change dynamically during a selective sweep. We also develop an algorithm, PreCIOSS, for separating carriers and non-carriers. Our tool has broad applicability as carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory—for example, in contexts involving drug-resistant pathogen strains or cancer subclones.
Collapse
Affiliation(s)
- Roy Ronen
- Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America
| | - Glenn Tesler
- Department of Mathematics, University of California, San Diego, La Jolla, California, United States of America
| | - Ali Akbari
- Department of Electrical & Computer Engineering, University of California, San Diego, La Jolla, California, United States of America
| | - Shay Zakov
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, California, United States of America
| | - Noah A. Rosenberg
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Vineet Bafna
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
73
|
Pybus M, Luisi P, Dall'Olio GM, Uzkudun M, Laayouni H, Bertranpetit J, Engelken J. Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations. Bioinformatics 2015; 31:3946-52. [PMID: 26315912 DOI: 10.1093/bioinformatics/btv493] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 08/17/2015] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Detecting positive selection in genomic regions is a recurrent topic in natural population genetic studies. However, there is little consistency among the regions detected in several genome-wide scans using different tests and/or populations. Furthermore, few methods address the challenge of classifying selective events according to specific features such as age, intensity or state (completeness). RESULTS We have developed a machine-learning classification framework that exploits the combined ability of some selection tests to uncover different polymorphism features expected under the hard sweep model, while controlling for population-specific demography. As a result, we achieve high sensitivity toward hard selective sweeps while adding insights about their completeness (whether a selected variant is fixed or not) and age of onset. Our method also determines the relevance of the individual methods implemented so far to detect positive selection under specific selective scenarios. We calibrated and applied the method to three reference human populations from The 1000 Genome Project to generate a genome-wide classification map of hard selective sweeps. This study improves detection of selective sweep by overcoming the classical selection versus no-selection classification strategy, and offers an explanation to the lack of consistency observed among selection tests when applied to real data. Very few signals were observed in the African population studied, while our method presents higher sensitivity in this population demography. AVAILABILITY AND IMPLEMENTATION The genome-wide results for three human populations from The 1000 Genomes Project and an R-package implementing the 'Hierarchical Boosting' framework are available at http://hsb.upf.edu/.
Collapse
Affiliation(s)
- Marc Pybus
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Pierre Luisi
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain, Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Giovanni Marco Dall'Olio
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain, Division of Cancer Studies, King's College of London, London SE1 1UL, UK and
| | - Manu Uzkudun
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Hafid Laayouni
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain, Departament de Genètica i de Microbiologia, Universitat Autonòma de Barcelona, Bellaterra 8193, Spain
| | - Jaume Bertranpetit
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Johannes Engelken
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain
| |
Collapse
|
74
|
Schlötterer C, Kofler R, Versace E, Tobler R, Franssen SU. Combining experimental evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation. Heredity (Edinb) 2015; 114:431-40. [PMID: 25269380 PMCID: PMC4815507 DOI: 10.1038/hdy.2014.86] [Citation(s) in RCA: 158] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 07/01/2014] [Accepted: 07/14/2014] [Indexed: 12/20/2022] Open
Abstract
Evolve and resequence (E&R) is a new approach to investigate the genomic responses to selection during experimental evolution. By using whole genome sequencing of pools of individuals (Pool-Seq), this method can identify selected variants in controlled and replicable experimental settings. Reviewing the current state of the field, we show that E&R can be powerful enough to identify causative genes and possibly even single-nucleotide polymorphisms. We also discuss how the experimental design and the complexity of the trait could result in a large number of false positive candidates. We suggest experimental and analytical strategies to maximize the power of E&R to uncover the genotype-phenotype link and serve as an important research tool for a broad range of evolutionary questions.
Collapse
Affiliation(s)
- C Schlötterer
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | - R Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | - E Versace
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy
| | - R Tobler
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
| | - S U Franssen
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| |
Collapse
|
75
|
Beltrame MH, Boldt ABW, Catarino SJ, Mendes HC, Boschmann SE, Goeldner I, Messias-Reason I. MBL-associated serine proteases (MASPs) and infectious diseases. Mol Immunol 2015; 67:85-100. [PMID: 25862418 PMCID: PMC7112674 DOI: 10.1016/j.molimm.2015.03.245] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 03/11/2015] [Accepted: 03/12/2015] [Indexed: 12/16/2022]
Abstract
MASP-1 and MASP-2 are central players of the lectin pathway of complement. MASP1 and MASP2 gene polymorphisms regulate protein serum levels and activity. MASP deficiencies are associated with increased infection susceptibility. MASP polymorphisms and serum levels are associated with disease progression.
The lectin pathway of the complement system has a pivotal role in the defense against infectious organisms. After binding of mannan-binding lectin (MBL), ficolins or collectin 11 to carbohydrates or acetylated residues on pathogen surfaces, dimers of MBL-associated serine proteases 1 and 2 (MASP-1 and MASP-2) activate a proteolytic cascade, which culminates in the formation of the membrane attack complex and pathogen lysis. Alternative splicing of the pre-mRNA encoding MASP-1 results in two other products, MASP-3 and MAp44, which regulate activation of the cascade. A similar mechanism allows the gene encoding MASP-2 to produce the truncated MAp19 protein. Polymorphisms in MASP1 and MASP2 genes are associated with protein serum levels and functional activity. Since the first report of a MASP deficiency in 2003, deficiencies in lectin pathway proteins have been associated with recurrent infections and several polymorphisms were associated with the susceptibility or protection to infectious diseases. In this review, we summarize the findings on the role of MASP polymorphisms and serum levels in bacterial, viral and protozoan infectious diseases.
Collapse
Affiliation(s)
- Marcia H Beltrame
- Department of Clinical Pathology, Hospital de Clínicas, Universidade Federal do Paraná (UFPR), Curitiba, PR, Brazil
| | - Angelica B W Boldt
- Department of Genetics, Universidade Federal do Paraná, Curitiba, PR, Brazil
| | - Sandra J Catarino
- Department of Clinical Pathology, Hospital de Clínicas, Universidade Federal do Paraná (UFPR), Curitiba, PR, Brazil
| | - Hellen C Mendes
- Department of Clinical Pathology, Hospital de Clínicas, Universidade Federal do Paraná (UFPR), Curitiba, PR, Brazil
| | - Stefanie E Boschmann
- Department of Clinical Pathology, Hospital de Clínicas, Universidade Federal do Paraná (UFPR), Curitiba, PR, Brazil
| | - Isabela Goeldner
- Department of Clinical Pathology, Hospital de Clínicas, Universidade Federal do Paraná (UFPR), Curitiba, PR, Brazil
| | - Iara Messias-Reason
- Department of Clinical Pathology, Hospital de Clínicas, Universidade Federal do Paraná (UFPR), Curitiba, PR, Brazil.
| |
Collapse
|
76
|
Soft shoulders ahead: spurious signatures of soft and partial selective sweeps result from linked hard sweeps. Genetics 2015; 200:267-84. [PMID: 25716978 DOI: 10.1534/genetics.115.174912] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 02/20/2015] [Indexed: 11/18/2022] Open
Abstract
Characterizing the nature of the adaptive process at the genetic level is a central goal for population genetics. In particular, we know little about the sources of adaptive substitution or about the number of adaptive variants currently segregating in nature. Historically, population geneticists have focused attention on the hard-sweep model of adaptation in which a de novo beneficial mutation arises and rapidly fixes in a population. Recently more attention has been given to soft-sweep models, in which alleles that were previously neutral, or nearly so, drift until such a time as the environment shifts and their selection coefficient changes to become beneficial. It remains an active and difficult problem, however, to tease apart the telltale signatures of hard vs. soft sweeps in genomic polymorphism data. Through extensive simulations of hard- and soft-sweep models, here we show that indeed the two might not be separable through the use of simple summary statistics. In particular, it seems that recombination in regions linked to, but distant from, sites of hard sweeps can create patterns of polymorphism that closely mirror what is expected to be found near soft sweeps. We find that a very similar situation arises when using haplotype-based statistics that are aimed at detecting partial or ongoing selective sweeps, such that it is difficult to distinguish the shoulder of a hard sweep from the center of a partial sweep. While knowing the location of the selected site mitigates this problem slightly, we show that stochasticity in signatures of natural selection will frequently cause the signal to reach its zenith far from this site and that this effect is more severe for soft sweeps; thus inferences of the target as well as the mode of positive selection may be inaccurate. In addition, both the time since a sweep ends and biologically realistic levels of allelic gene conversion lead to errors in the classification and identification of selective sweeps. This general problem of "soft shoulders" underscores the difficulty in differentiating soft and partial sweeps from hard-sweep scenarios in molecular population genomics data. The soft-shoulder effect also implies that the more common hard sweeps have been in recent evolutionary history, the more prevalent spurious signatures of soft or partial sweeps may appear in some genome-wide scans.
Collapse
|
77
|
Cadzow M, Boocock J, Nguyen HT, Wilcox P, Merriman TR, Black MA. A bioinformatics workflow for detecting signatures of selection in genomic data. Front Genet 2014; 5:293. [PMID: 25206364 PMCID: PMC4144660 DOI: 10.3389/fgene.2014.00293] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Accepted: 08/06/2014] [Indexed: 11/13/2022] Open
Abstract
The detection of "signatures of selection" is now possible on a genome-wide scale in many plant and animal species, and can be performed in a population-specific manner due to the wealth of per-population genome-wide genotype data that is available. With genomic regions that exhibit evidence of having been under selection shown to also be enriched for genes associated with biologically important traits, detection of evidence of selective pressure is emerging as an additional approach for identifying novel gene-trait associations. While high-density genotype data is now relatively easy to obtain, for many researchers it is not immediately obvious how to go about identifying signatures of selection in these data sets. Here we describe a basic workflow, constructed from open source tools, for detecting and examining evidence of selection in genomic data. Code to install and implement the pipeline components, and instructions to run a basic analysis using the workflow described here, can be downloaded from our public GitHub repository: http://www.github.com/smilefreak/selectionTools/
Collapse
Affiliation(s)
- Murray Cadzow
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand
| | - James Boocock
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand
| | - Hoang T Nguyen
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand ; Department of Mathematics and Statistics, University of Otago Dunedin, New Zealand
| | - Phillip Wilcox
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand ; New Zealand Forest Research Institute Ltd Rotorua, New Zealand
| | - Tony R Merriman
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand
| | - Michael A Black
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand
| |
Collapse
|
78
|
Rafajlović M, Klassmann A, Eriksson A, Wiehe T, Mehlig B. Demography-adjusted tests of neutrality based on genome-wide SNP data. Theor Popul Biol 2014; 95:1-12. [PMID: 24911258 DOI: 10.1016/j.tpb.2014.05.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2013] [Revised: 05/28/2014] [Accepted: 05/29/2014] [Indexed: 12/15/2022]
Abstract
Tests of the neutral evolution hypothesis are usually built on the standard null model which assumes that mutations are neutral and the population size remains constant over time. However, it is unclear how such tests are affected if the last assumption is dropped. Here, we extend the unifying framework for tests based on the site frequency spectrum, introduced by Achaz and Ferretti, to populations of varying size. Key ingredients are the first two moments of the site frequency spectrum. We show how these moments can be computed analytically if a population has experienced two instantaneous size changes in the past. We apply our method to data from ten human populations gathered in the 1000 genomes project, estimate their demographies and define demography-adjusted versions of Tajima's D, Fay & Wu's H, and Zeng's E. Our results show that demography-adjusted test statistics facilitate the direct comparison between populations and that most of the differences among populations seen in the original unadjusted tests can be explained by their underlying demographies. Upon carrying out whole-genome screens for deviations from neutrality, we identify candidate regions of recent positive selection. We provide track files with values of the adjusted and unadjusted tests for upload to the UCSC genome browser.
Collapse
Affiliation(s)
- M Rafajlović
- Department of Physics, University of Gothenburg, SE-412 96 Gothenburg, Sweden; The Linnaeus Centre for Marine Evolutionary Biology, University of Gothenburg, SE-405 30 Gothenburg, Sweden
| | - A Klassmann
- Institut für Genetik, Universität zu Köln, 50674 Köln, Germany
| | - A Eriksson
- Department of Zoology, University of Cambridge, CB2 3EJ Cambridge, UK; Integrative Systems Biology Lab, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - T Wiehe
- Institut für Genetik, Universität zu Köln, 50674 Köln, Germany
| | - B Mehlig
- Department of Physics, University of Gothenburg, SE-412 96 Gothenburg, Sweden; The Linnaeus Centre for Marine Evolutionary Biology, University of Gothenburg, SE-405 30 Gothenburg, Sweden.
| |
Collapse
|
79
|
Udpa N, Ronen R, Zhou D, Liang J, Stobdan T, Appenzeller O, Yin Y, Du Y, Guo L, Cao R, Wang Y, Jin X, Huang C, Jia W, Cao D, Guo G, Claydon VE, Hainsworth R, Gamboa JL, Zibenigus M, Zenebe G, Xue J, Liu S, Frazer KA, Li Y, Bafna V, Haddad GG. Whole genome sequencing of Ethiopian highlanders reveals conserved hypoxia tolerance genes. Genome Biol 2014; 15:R36. [PMID: 24555826 PMCID: PMC4054780 DOI: 10.1186/gb-2014-15-2-r36] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2013] [Accepted: 02/20/2014] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Although it has long been proposed that genetic factors contribute to adaptation to high altitude, such factors remain largely unverified. Recent advances in high-throughput sequencing have made it feasible to analyze genome-wide patterns of genetic variation in human populations. Since traditionally such studies surveyed only a small fraction of the genome, interpretation of the results was limited. RESULTS We report here the results of the first whole genome resequencing-based analysis identifying genes that likely modulate high altitude adaptation in native Ethiopians residing at 3,500 m above sea level on Bale Plateau or Chennek field in Ethiopia. Using cross-population tests of selection, we identify regions with a significant loss of diversity, indicative of a selective sweep. We focus on a 208 kbp gene-rich region on chromosome 19, which is significant in both of the Ethiopian subpopulations sampled. This region contains eight protein-coding genes and spans 135 SNPs. To elucidate its potential role in hypoxia tolerance, we experimentally tested whether individual genes from the region affect hypoxia tolerance in Drosophila. Three genes significantly impact survival rates in low oxygen: cic, an ortholog of human CIC, Hsl, an ortholog of human LIPE, and Paf-AHα, an ortholog of human PAFAH1B3. CONCLUSIONS Our study reveals evolutionarily conserved genes that modulate hypoxia tolerance. In addition, we show that many of our results would likely be unattainable using data from exome sequencing or microarray studies. This highlights the importance of whole genome sequencing for investigating adaptation by natural selection.
Collapse
|