1
|
Velayudhan SM, Yin T, Alam S, Brügemann K, Sejian V, Bhatta R, Schlecht E, König S. Unraveling the Genomic Association for Milk Production Traits and Signatures of Selection of Cattle in a Harsh Tropical Environment. BIOLOGY 2023; 12:1483. [PMID: 38132309 PMCID: PMC10740459 DOI: 10.3390/biology12121483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 11/15/2023] [Accepted: 11/29/2023] [Indexed: 12/23/2023]
Abstract
A study was designed to identify the genomic regions associated with milk production traits in a dairy cattle population reared by smallholder farmers in the harsh and challenging tropical savanna climate of Bengaluru, India. This study is a first-of-its-kind attempt to identify the selection sweeps for the dairy cattle breeds reared in such an environment. Two hundred forty lactating dairy cows reared by 68 farmers across the rural-urban transiting regions of Bengaluru were selected for this study. A genome-wide association study (GWAS) was performed to identify candidate genes for test-day milk yield, solids-not-fat (SNF), milk lactose, milk density and clinical mastitis. Furthermore, the cross-population extended haplotype homozygosity (XP-EHH) methodology was adopted to scan the dairy cattle breeds (Holstein Friesian, Jersey and Crossbred) in Bengaluru. Two SNPs, rs109340659 and rs41571523, were observed to be significantly associated with test-day milk yield. No significant SNPs were observed for the remaining production traits. The GWAS for milk lactose revealed one SNP (rs41634101) that was very close to the threshold limit, though not significant. The potential candidate genes fibrosin-like 1 (FBRSL) and calcium voltage-gated channel auxiliary subunit gamma 3 (CACN) were identified to be in close proximity to the SNP identified for test-day milk yield. These genes were observed to be associated with milk production traits based on previous reports. Furthermore, the selection signature analysis revealed a number of regions under selection for the breed-group comparisons (Crossbred-HF, Crossbred-J and HF-J). Functional analysis of these annotated genes under selection indicated pathways and mechanisms involving ubiquitination, cell signaling and immune response. These findings point towards the probable selection of dairy cows in Bengaluru for thermotolerance.
Collapse
Affiliation(s)
| | - Tong Yin
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, Ludwigstraße 21 b, 35390 Gießen, Germany; (S.M.V.); (T.Y.)
| | - Shahin Alam
- Animal Husbandry in the Tropics and Subtropics, University of Kassel and Georg-August-Universität Göttingen, Steinstr. 19, 37213 Witzenhausen, Germany; (S.A.)
| | - Kerstin Brügemann
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, Ludwigstraße 21 b, 35390 Gießen, Germany; (S.M.V.); (T.Y.)
| | - Veerasamy Sejian
- National Institute of Animal Nutrition and Physiology (NIANP), Hosur Rd, Chennakeshava Nagar, Adugodi, Bengaluru 560030, India
| | - Raghavendra Bhatta
- National Institute of Animal Nutrition and Physiology (NIANP), Hosur Rd, Chennakeshava Nagar, Adugodi, Bengaluru 560030, India
| | - Eva Schlecht
- Animal Husbandry in the Tropics and Subtropics, University of Kassel and Georg-August-Universität Göttingen, Steinstr. 19, 37213 Witzenhausen, Germany; (S.A.)
| | - Sven König
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, Ludwigstraße 21 b, 35390 Gießen, Germany; (S.M.V.); (T.Y.)
| |
Collapse
|
2
|
Panigrahi M, Rajawat D, Nayak SS, Ghildiyal K, Sharma A, Jain K, Lei C, Bhushan B, Mishra BP, Dutt T. Landmarks in the history of selective sweeps. Anim Genet 2023; 54:667-688. [PMID: 37710403 DOI: 10.1111/age.13355] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/28/2023] [Indexed: 09/16/2023]
Abstract
Half a century ago, a seminal article on the hitchhiking effect by Smith and Haigh inaugurated the concept of the selection signature. Selective sweeps are characterised by the rapid spread of an advantageous genetic variant through a population and hence play an important role in shaping evolution and research on genetic diversity. The process by which a beneficial allele arises and becomes fixed in a population, leading to a increase in the frequency of other linked alleles, is known as genetic hitchhiking or genetic draft. Kimura's neutral theory and hitchhiking theory are complementary, with Kimura's neutral evolution as the 'null model' and positive selection as the 'signal'. Both are widely accepted in evolution, especially with genomics enabling precise measurements. Significant advances in genomic technologies, such as next-generation sequencing, high-density SNP arrays and powerful bioinformatics tools, have made it possible to systematically investigate selection signatures in a variety of species. Although the history of selection signatures is relatively recent, progress has been made in the last two decades, owing to the increasing availability of large-scale genomic data and the development of computational methods. In this review, we embark on a journey through the history of research on selective sweeps, ranging from early theoretical work to recent empirical studies that utilise genomic data.
Collapse
Affiliation(s)
- Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Divya Rajawat
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | | | - Kanika Ghildiyal
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Anurodh Sharma
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Karan Jain
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Chuzhao Lei
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Bishnu Prasad Mishra
- Division of Animal Biotechnology, ICAR-National Bureau of Animal Genetic Resources, Karnal, India
| | - Triveni Dutt
- Livestock Production and Management Section, Indian Veterinary Research Institute, Bareilly, India
| |
Collapse
|
3
|
Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data. Mol Biol Evol 2023; 40:msad216. [PMID: 37772983 PMCID: PMC10581699 DOI: 10.1093/molbev/msad216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/10/2023] [Accepted: 09/14/2023] [Indexed: 09/30/2023] Open
Abstract
Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under nonconvex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data although preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx, which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
Collapse
Affiliation(s)
- Md Ruhul Amin
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Mahmudul Hasan
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
4
|
Whitehouse LS, Schrider DR. Timesweeper: accurately identifying selective sweeps using population genomic time series. Genetics 2023; 224:iyad084. [PMID: 37157914 PMCID: PMC10324941 DOI: 10.1093/genetics/iyad084] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 07/25/2022] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open
Abstract
Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community.
Collapse
Affiliation(s)
- Logan S Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| |
Collapse
|
5
|
Chen Z, Reynolds RH, Pardiñas AF, Gagliano Taliun SA, van Rheenen W, Lin K, Shatunov A, Gustavsson EK, Fogh I, Jones AR, Robberecht W, Corcia P, Chiò A, Shaw PJ, Morrison KE, Veldink JH, van den Berg LH, Shaw CE, Powell JF, Silani V, Hardy JA, Houlden H, Owen MJ, Turner MR, Ryten M, Al-Chalabi A. The contribution of Neanderthal introgression and natural selection to neurodegenerative diseases. Neurobiol Dis 2023; 180:106082. [PMID: 36925053 DOI: 10.1016/j.nbd.2023.106082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 03/10/2023] [Accepted: 03/13/2023] [Indexed: 03/18/2023] Open
Abstract
Humans are thought to be more susceptible to neurodegeneration than equivalently-aged primates. It is not known whether this vulnerability is specific to anatomically-modern humans or shared with other hominids. The contribution of introgressed Neanderthal DNA to neurodegenerative disorders remains uncertain. It is also unclear how common variants associated with neurodegenerative disease risk are maintained by natural selection in the population despite their deleterious effects. In this study, we aimed to quantify the genome-wide contribution of Neanderthal introgression and positive selection to the heritability of complex neurodegenerative disorders to address these questions. We used stratified-linkage disequilibrium score regression to investigate the relationship between five SNP-based signatures of natural selection, reflecting different timepoints of evolution, and genome-wide associated variants of the three most prevalent neurodegenerative disorders: Alzheimer's disease, amyotrophic lateral sclerosis and Parkinson's disease. We found no evidence for enrichment of positively-selected SNPs in the heritability of Alzheimer's disease, amyotrophic lateral sclerosis and Parkinson's disease, suggesting that common deleterious disease variants are unlikely to be maintained by positive selection. There was no enrichment of Neanderthal introgression in the SNP-heritability of these disorders, suggesting that Neanderthal admixture is unlikely to have contributed to disease risk. These findings provide insight into the origins of neurodegenerative disorders within the evolution of Homo sapiens and addresses a long-standing debate, showing that Neanderthal admixture is unlikely to have contributed to common genetic risk of neurodegeneration in anatomically-modern humans.
Collapse
Affiliation(s)
- Zhongbo Chen
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK; Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, UCL, London, UK; NIHR Great Ormond Street Hospital Biomedical Research Centre, UCL, London, UK.
| | - Regina H Reynolds
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, UCL, London, UK; NIHR Great Ormond Street Hospital Biomedical Research Centre, UCL, London, UK
| | - Antonio F Pardiñas
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
| | - Sarah A Gagliano Taliun
- Department of Medicine & Department of Neurosciences, Université de Montréal, Montréal, Québec, Canada; Montréal Heart Institute, Montréal, Québec, Canada
| | - Wouter van Rheenen
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, the Netherlands
| | - Kuang Lin
- Nuffield Department of Population Health, Oxford University, Oxford, UK
| | - Aleksey Shatunov
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Emil K Gustavsson
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, UCL, London, UK; NIHR Great Ormond Street Hospital Biomedical Research Centre, UCL, London, UK
| | - Isabella Fogh
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Ashley R Jones
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Wim Robberecht
- Department of Neurology, University Hospital Leuven, Leuven, Belgium; Department of Neurosciences, Experimental Neurology and Leuven Research Institute for Neuroscience and Disease, Leuven, Belgium; Vesalius Research Center, Laboratory of Neurobiology, Leuven, Belgium
| | - Philippe Corcia
- ALS Center, Department of Neurology, CHRU Bretonneau, Tours, France
| | - Adriano Chiò
- Rita Levi Montalcini Department of Neuroscience, ALS Centre, University of Torino, Turin, Italy; Azienda Ospedaliera Universitaria Città della Salute e della Scienza, Torino, Italy
| | - Pamela J Shaw
- Academic Neurology Unit, Department of Neuroscience, Faculty of Medicine, Dentistry and Health, University of Sheffield, Sheffield, UK
| | - Karen E Morrison
- School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, UK
| | - Jan H Veldink
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, the Netherlands
| | - Leonard H van den Berg
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, the Netherlands
| | - Christopher E Shaw
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - John F Powell
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Vincenzo Silani
- Department of Neurology and Laboratory of Neuroscience, IRCCS Istituto Auxologico Italiano, Milano, Italy; Department of Pathophysiology and Transplantation, Dino Ferrari Center, Università degli Studi di Milano, 20122 Milano, Italy
| | - John A Hardy
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK; Reta Lila Weston Institute, Queen Square Institute of Neurology, UCL, London, UK; UK Dementia Research Institute, Queen Square Institute of Neurology, UCL, London, UK; NIHR University College London Hospitals Biomedical Research Centre, London, UK; Institute for Advanced Study, The Hong Kong University of Science and Technology, Hong Kong, SAR, China
| | - Henry Houlden
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - Michael J Owen
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
| | - Martin R Turner
- Nuffield Department of Clinical Neurosciences, Oxford University, Oxford, UK
| | - Mina Ryten
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, UCL, London, UK; NIHR Great Ormond Street Hospital Biomedical Research Centre, UCL, London, UK
| | - Ammar Al-Chalabi
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
| |
Collapse
|
6
|
Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor decomposition based feature extraction and classification to detect natural selection from genomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.27.527731. [PMID: 37034767 PMCID: PMC10081272 DOI: 10.1101/2023.03.27.527731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under non-convex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data while preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx , which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
Collapse
|
7
|
Ghildiyal K, Panigrahi M, Kumar H, Rajawat D, Nayak SS, Lei C, Bhushan B, Dutt T. Selection signatures for fiber production in commercial species: A review. Anim Genet 2023; 54:3-23. [PMID: 36352515 DOI: 10.1111/age.13272] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/11/2022] [Accepted: 10/11/2022] [Indexed: 11/11/2022]
Abstract
Natural fibers derived from diverse animal species have gained increased attention in recent years due to their favorable environmental effects, long-term sustainability benefits, and remarkable physical and mechanical properties that make them valuable raw materials used for textile and non-textile production. Domestication and selective breeding for the economically significant fiber traits play an imperative role in shaping the genomes and, thus, positively impact the overall productivity of the various fiber-producing species. These selection pressures leave unique footprints on the genome due to alteration in the allelic frequencies at specific loci, characterizing selective sweeps. Recent advances in genomics have enabled the discovery of selection signatures across the genome using a variety of methods. The increased demand for 'green products' manufactured from natural fibers necessitates a detailed investigation of the genomes of the various fiber-producing plant and animal species to identify the candidate genes associated with important fiber attributes such as fiber diameter/fineness, color, length, and strength, among others. The objective of this review is to present a comprehensive overview of the concept of selection signature and selective sweeps, discuss the main methods used for its detection, and address the selection signature studies conducted so far in the diverse fiber-producing animal species.
Collapse
Affiliation(s)
- Kanika Ghildiyal
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Harshit Kumar
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Divya Rajawat
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | | | - Chuzhao Lei
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Triveni Dutt
- Livestock Production and Management Section, Indian Veterinary Research Institute, Bareilly, India
| |
Collapse
|
8
|
Zhang J, Nie C, Zhang X, Zhao X, Jia Y, Han J, Chen Y, Wang L, Lv X, Yang W, Li K, Zhang J, Ning Z, Bao H, Li J, Zhao C, Qu L. A ∼ 4.1 kb deletion in IRX1 gene upstream is completely associated with rumplessness in Piao chicken. Genomics 2022; 114:110515. [PMID: 36306957 DOI: 10.1016/j.ygeno.2022.110515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/02/2022] [Accepted: 10/23/2022] [Indexed: 11/04/2022]
Abstract
Piao chicken, a Chinese indigenous rumpless chicken breed, lacks pygostyle, caudal vertebra, uropygial gland and tail feathers. The rumplessness in Piao chicken presents an autosomal dominant inheritance pattern. However, the molecular genetic mechanisms underlying the rumplessness in Piao chicken remains unclear. In this study, whole-genome resequencing was performed for 146 individuals from 10 chicken breeds, including 9 tailed chicken breeds and Piao rumpless breed. Tailbone CT scan for Piao chickens and WL chickens, revealed that some Piao chicken tails were normal in number, and for a few Piao chickens tail length and tail bone numbers were between the rumpless and the normal tailed chickens. The results showed that the rumpless phenotype has not been completely fixed in Piao chicken breed. Using selection signature analysis and structural variation detection, we found a 4174 bp deletion located in the upstream region of IRX1 gene on chromosome 2 related to rumpless phenotype. Structural variation genotyping showed that the deletion was present in all 32 rumpless Piao chickens (del/del, wild/del) and absent from all 112 tailed chickens included in the dataset for the other 9 breeds and 2 tailed Piao chickens (wild/wild). In summary, all rumpless Piao chickens tested here carry this deletion mutation, to show a complete linkage association with rumplessness trait. We suggested that the 4174 bp deletion could be causative for rumpless phenotype in Piao chicken since this is the only mutation to show the complete linkage disequilibrium with rumplessness on whole genome level across all of 146 chickens from the 10 breeds. This study could facilitate a better understanding of the genetic characteristics of Piao chicken.
Collapse
Affiliation(s)
- Jinxin Zhang
- National Engineering Laboratory for Animal Breeding, Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Changsheng Nie
- National Engineering Laboratory for Animal Breeding, Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Xinye Zhang
- National Engineering Laboratory for Animal Breeding, Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Xiurong Zhao
- National Engineering Laboratory for Animal Breeding, Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Yaxiong Jia
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100097, China
| | - Jianlin Han
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100097, China
| | - Yu Chen
- Beijing Municipal General Station of Animal Science, Beijing 100101, China
| | - Liang Wang
- Beijing Municipal General Station of Animal Science, Beijing 100101, China
| | - Xueze Lv
- Beijing Municipal General Station of Animal Science, Beijing 100101, China
| | - Weifang Yang
- Beijing Municipal General Station of Animal Science, Beijing 100101, China
| | - Kaiyang Li
- Beijing Municipal General Station of Animal Science, Beijing 100101, China
| | - Jianwei Zhang
- Beijing Municipal General Station of Animal Science, Beijing 100101, China
| | - Zhonghua Ning
- National Engineering Laboratory for Animal Breeding, Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Haigang Bao
- National Engineering Laboratory for Animal Breeding, Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Junying Li
- National Engineering Laboratory for Animal Breeding, Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Chunjiang Zhao
- National Engineering Laboratory for Animal Breeding, Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Lujiang Qu
- Xinjiang Production & construction corps key laboratory of protection and utilization of biological resources in Tarim Basin, Tarim University, Alar, 843300, China; National Engineering Laboratory for Animal Breeding, Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
9
|
Dong H, Dong Z, Wang F, Wang G, Luo X, Lei C, Chen J. Whole Genome Sequencing Provides New Insights Into the Genetic Diversity and Coat Color of Asiatic Wild Ass and Its Hybrids. Front Genet 2022; 13:818420. [PMID: 35646088 PMCID: PMC9135160 DOI: 10.3389/fgene.2022.818420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 04/25/2022] [Indexed: 11/17/2022] Open
Abstract
The diversity of livestock coat color results from human positive selection and is an indispensable part of breed registration. As an important biodiversity resource, Asiatic wild ass has many special characteristics, including the most visualized feature, its yellowish-brown coat color, and excellent adaptation. To explore the genetic mechanisms of phenotypic characteristics in Asiatic wild ass and its hybrids, we resequenced the whole genome of one Mongolian Kulan (a subspecies of Asiatic wild ass) and 29 Kulan hybrids (Mongolian Kulan ♂×Xinjiang♀), and the ancestor composition indicated the true lineage of the hybrids. XP-EHH (Cross Population Extended Haplotype Homozygosity), θπ-ratio (Nucleotide Diversity Ratio), CLR (Composite Likelihood Ratio) and θπ (Nucleotide Diversity) methods were used to detect the candidate regions of positive selection in Asiatic wild ass and its hybrids. Several immune genes (DEFA1, DEFA5, DEFA7, GIMAP4, GIMAP1, IGLC1, IGLL5, GZMB and HLA) were observed by the CLR and θπ methods. XP-EHH and θπ-ratio revealed that these genes are potentially responsible for coat color (KITLG) and meat quality traits (PDE1B and MYLK2). Furthermore, the heatmap was able to show the clear difference in the haplotype of the KITLG gene between the Kulan hybrids and Asiatic wild ass group and the Guanzhong black donkey group, which is a powerful demonstration of the key role of KITLG in donkey color. Therefore, our study may provide new insights into the genetic basis of coat color, meat quality traits and immunity of Asiatic wild ass and its hybrids.
Collapse
Affiliation(s)
- Hong Dong
- College of Animal Science and Technology, SHIHEZI University, Shihezi, China
| | - Zheng Dong
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Fuwen Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Gang Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Xiaoyu Luo
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Chuzhao Lei
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Jingbo Chen
- College of Animal Science and Technology, SHIHEZI University, Shihezi, China
- *Correspondence: Jingbo Chen,
| |
Collapse
|
10
|
DeGiorgio M, Szpiech ZA. A spatially aware likelihood test to detect sweeps from haplotype distributions. PLoS Genet 2022; 18:e1010134. [PMID: 35404934 PMCID: PMC9022890 DOI: 10.1371/journal.pgen.1010134] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 04/21/2022] [Accepted: 03/04/2022] [Indexed: 01/13/2023] Open
Abstract
The inference of positive selection in genomes is a problem of great interest in evolutionary genomics. By identifying putative regions of the genome that contain adaptive mutations, we are able to learn about the biology of organisms and their evolutionary history. Here we introduce a composite likelihood method that identifies recently completed or ongoing positive selection by searching for extreme distortions in the spatial distribution of the haplotype frequency spectrum along the genome relative to the genome-wide expectation taken as neutrality. Furthermore, the method simultaneously infers two parameters of the sweep: the number of sweeping haplotypes and the "width" of the sweep, which is related to the strength and timing of selection. We demonstrate that this method outperforms the leading haplotype-based selection statistics, though strong signals in low-recombination regions merit extra scrutiny. As a positive control, we apply it to two well-studied human populations from the 1000 Genomes Project and examine haplotype frequency spectrum patterns at the LCT and MHC loci. We also apply it to a data set of brown rats sampled in NYC and identify genes related to olfactory perception. To facilitate use of this method, we have implemented it in user-friendly open source software.
Collapse
Affiliation(s)
- Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida, United States of America
| | - Zachary A. Szpiech
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
11
|
Niu Q, Zhang T, Xu L, Wang T, Wang Z, Zhu B, Zhang L, Gao H, Song J, Li J, Xu L. Integration of selection signatures and multi-trait GWAS reveals polygenic genetic architecture of carcass traits in beef cattle. Genomics 2021; 113:3325-3336. [PMID: 34314829 DOI: 10.1016/j.ygeno.2021.07.025] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 05/05/2021] [Accepted: 07/22/2021] [Indexed: 11/18/2022]
Abstract
Carcass merits are widely considered as economically important traits affecting beef production in the beef cattle industry. However, the genetic basis of carcass traits remains to be well understood. Here, we applied multiple methods, including the Composite of Likelihood Ratio (CLR) and Genome-wide Association Study (GWAS), to explore the selection signatures and candidate variants affecting carcass traits. We identified 11,600 selected regions overlapping with 2214 candidate genes, and most of those were enriched in binding and gene regulation. Notably, we identified 66 and 110 potential variants significantly associated with carcass traits using single-trait and multi-traits analyses, respectively. By integrating selection signatures with single and multi-traits associations, we identified 12 and 27 putative genes, respectively. Several highly conserved missense variants were identified in OR5M13D, NCAPG, and TEX2. Our study supported polygenic genetic architecture of carcass traits and provided novel insights into the genetic basis of complex traits in beef cattle.
Collapse
Affiliation(s)
- Qunhao Niu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Tianliu Zhang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Ling Xu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Tianzhen Wang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Zezhao Wang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bo Zhu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Lupei Zhang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Huijiang Gao
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Jiuzhou Song
- Department of Animal and Avian Science, University of Maryland, College Park, USA
| | - Junya Li
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Lingyang Xu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| |
Collapse
|
12
|
Guirao‐Rico S, González J. Benchmarking the performance of Pool-seq SNP callers using simulated and real sequencing data. Mol Ecol Resour 2021; 21:1216-1229. [PMID: 33534960 PMCID: PMC8251607 DOI: 10.1111/1755-0998.13343] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 12/21/2020] [Accepted: 01/27/2021] [Indexed: 12/13/2022]
Abstract
Population genomics is a fast-developing discipline with promising applications in a growing number of life sciences fields. Advances in sequencing technologies and bioinformatics tools allow population genomics to exploit genome-wide information to identify the molecular variants underlying traits of interest and the evolutionary forces that modulate these variants through space and time. However, the cost of genomic analyses of multiple populations is still too high to address them through individual genome sequencing. Pooling individuals for sequencing can be a more effective strategy in Single Nucleotide Polymorphism (SNP) detection and allele frequency estimation because of a higher total coverage. However, compared to individual sequencing, SNP calling from pools has the additional difficulty of distinguishing rare variants from sequencing errors, which is often avoided by establishing a minimum threshold allele frequency for the analysis. Finding an optimal balance between minimizing information loss and reducing sequencing costs is essential to ensure the success of population genomics studies. Here, we have benchmarked the performance of SNP callers for Pool-seq data, based on different approaches, under different conditions, and using computer simulations and real data. We found that SNP callers performance varied for allele frequencies up to 0.35. We also found that SNP callers based on Bayesian (SNAPE-pooled) or maximum likelihood (MAPGD) approaches outperform the two heuristic callers tested (VarScan and PoolSNP), in terms of the balance between sensitivity and FDR both in simulated and sequencing data. Our results will help inform the selection of the most appropriate SNP caller not only for large-scale population studies but also in cases where the Pool-seq strategy is the only option, such as in metagenomic or polyploid studies.
Collapse
Affiliation(s)
- Sara Guirao‐Rico
- Institute of Evolutionary BiologyCSIC‐Universitat Pompeu FabraBarcelonaSpain
| | - Josefa González
- Institute of Evolutionary BiologyCSIC‐Universitat Pompeu FabraBarcelonaSpain
| |
Collapse
|
13
|
Zeng K, Charlesworth B, Hobolth A. Studying models of balancing selection using phase-type theory. Genetics 2021; 218:6237896. [PMID: 33871627 DOI: 10.1093/genetics/iyab055] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/25/2021] [Indexed: 11/15/2022] Open
Abstract
Balancing selection (BLS) is the evolutionary force that maintains high levels of genetic variability in many important genes. To further our understanding of its evolutionary significance, we analyze models with BLS acting on a biallelic locus: an equilibrium model with long-term BLS, a model with long-term BLS and recent changes in population size, and a model of recent BLS. Using phase-type theory, a mathematical tool for analyzing continuous time Markov chains with an absorbing state, we examine how BLS affects polymorphism patterns in linked neutral regions, as summarized by nucleotide diversity, the expected number of segregating sites, the site frequency spectrum, and the level of linkage disequilibrium (LD). Long-term BLS affects polymorphism patterns in a relatively small genomic neighborhood, and such selection targets are easier to detect when the equilibrium frequencies of the selected variants are close to 50%, or when there has been a population size reduction. For a new mutation subject to BLS, its initial increase in frequency in the population causes linked neutral regions to have reduced diversity, an excess of both high and low frequency derived variants, and elevated LD with the selected locus. These patterns are similar to those produced by selective sweeps, but the effects of recent BLS are weaker. Nonetheless, compared to selective sweeps, nonequilibrium polymorphism and LD patterns persist for a much longer period under recent BLS, which may increase the chance of detecting such selection targets. An R package for analyzing these models, among others (e.g., isolation with migration), is available.
Collapse
Affiliation(s)
- Kai Zeng
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Asger Hobolth
- Department of Mathematics, Aarhus University, Aarhus DK-8000, Denmark
| |
Collapse
|
14
|
Harris AM, DeGiorgio M. A Likelihood Approach for Uncovering Selective Sweep Signatures from Haplotype Data. Mol Biol Evol 2021; 37:3023-3046. [PMID: 32392293 PMCID: PMC7530616 DOI: 10.1093/molbev/msaa115] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Selective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverages the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole-genome polymorphism data sets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. We also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.
Collapse
Affiliation(s)
- Alexandre M Harris
- Department of Biology, Pennsylvania State University, University Park, PA.,Molecular, Cellular, and Integrative Biosciences, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
| |
Collapse
|
15
|
Abstract
Plant pathogens can adapt to quantitative resistance, eroding its effectiveness. The aim of this work was to reveal the genomic basis of adaptation to such a resistance in populations of the fungus Pseudocercospora fijiensis, a major devastating pathogen of banana, by studying convergent adaptation on different cultivars. Samples from P. fijiensis populations showing a local adaptation pattern on new banana hybrids with quantitative resistance were compared, based on a genome scan approach, with samples from traditional and more susceptible cultivars in Cuba and the Dominican Republic. Whole-genome sequencing of pools of P. fijiensis isolates (pool-seq) sampled from three locations per country was conducted according to a paired population design. The findings of different combined analyses highly supported the existence of convergent adaptation on the study cultivars between locations within but not between countries. Five to six genomic regions involved in this adaptation were detected in each country. An annotation analysis and available biological data supported the hypothesis that some genes within the detected genomic regions may play a role in quantitative pathogenicity, including gene regulation. The results suggested that the genetic basis of fungal adaptation to quantitative plant resistance is at least oligogenic, while highlighting the existence of specific host-pathogen interactions for this kind of resistance.IMPORTANCE Understanding the genetic basis of pathogen adaptation to quantitative resistance in plants has a key role to play in establishing durable strategies for resistance deployment. In this context, a population genomic approach was developed for a major plant pathogen (the fungus Pseudocercospora fijiensis causing black leaf streak disease of banana) whereby samples from new resistant banana hybrids were compared with samples from more susceptible conventional cultivars in two countries. A total of 11 genomic regions for which there was strong evidence of selection by quantitative resistance were detected. An annotation analysis and available biological data supported the hypothesis that some of the genes within these regions may play a role in quantitative pathogenicity. These results suggested a polygenic basis of quantitative pathogenicity in this fungal pathogen and complex molecular plant-pathogen interactions in quantitative disease development involving several genes on both sides.
Collapse
|
16
|
|
17
|
Horscroft C, Ennis S, Pengelly RJ, Sluckin TJ, Collins A. Sequencing era methods for identifying signatures of selection in the genome. Brief Bioinform 2020; 20:1997-2008. [PMID: 30053138 DOI: 10.1093/bib/bby064] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 05/16/2018] [Indexed: 12/12/2022] Open
Abstract
Insights into genetic loci which are under selection and their functional roles contribute to increased understanding of the patterns of phenotypic variation we observe today. The availability of whole-genome sequence data, for humans and other species, provides opportunities to investigate adaptation and evolution at unprecedented resolution. Many analytical methods have been developed to interrogate these large data sets and characterize signatures of selection in the genome. We review here recently developed methods and consider the impact of increased computing power and data availability on the detection of selection signatures. Consideration of demography, recombination and other confounding factors is important, and use of a range of methods in combination is a powerful route to resolving different forms of selection in genome sequence data. Overall, a substantial improvement in methods for application to whole-genome sequencing is evident, although further work is required to develop robust and computationally efficient approaches which may increase reproducibility across studies.
Collapse
Affiliation(s)
- Clare Horscroft
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Sarah Ennis
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Reuben J Pengelly
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Timothy J Sluckin
- Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK.,Mathematical Sciences, University of Southampton, Highfield, Southampton, UK
| | - Andrew Collins
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| |
Collapse
|
18
|
VolcanoFinder: Genomic scans for adaptive introgression. PLoS Genet 2020; 16:e1008867. [PMID: 32555579 PMCID: PMC7326285 DOI: 10.1371/journal.pgen.1008867] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 06/30/2020] [Accepted: 05/18/2020] [Indexed: 12/16/2022] Open
Abstract
Recent research shows that introgression between closely-related species is an important source of adaptive alleles for a wide range of taxa. Typically, detection of adaptive introgression from genomic data relies on comparative analyses that require sequence data from both the recipient and the donor species. However, in many cases, the donor is unknown or the data is not currently available. Here, we introduce a genome-scan method—VolcanoFinder—to detect recent events of adaptive introgression using polymorphism data from the recipient species only. VolcanoFinder detects adaptive introgression sweeps from the pattern of excess intermediate-frequency polymorphism they produce in the flanking region of the genome, a pattern which appears as a volcano-shape in pairwise genetic diversity. Using coalescent theory, we derive analytical predictions for these patterns. Based on these results, we develop a composite-likelihood test to detect signatures of adaptive introgression relative to the genomic background. Simulation results show that VolcanoFinder has high statistical power to detect these signatures, even for older sweeps and for soft sweeps initiated by multiple migrant haplotypes. Finally, we implement VolcanoFinder to detect archaic introgression in European and sub-Saharan African human populations, and uncovered interesting candidates in both populations, such as TSHR in Europeans and TCHH-RPTN in Africans. We discuss their biological implications and provide guidelines for identifying and circumventing artifactual signals during empirical applications of VolcanoFinder. The process by which beneficial alleles are introduced into a species from a closely-related species is termed adaptive introgression. We present an analytically-tractable model for the effects of adaptive introgression on non-adaptive genetic variation in the genomic region surrounding the beneficial allele. The result we describe is a characteristic volcano-shaped pattern of increased variability that arises around the positively-selected site, and we introduce an open-source method VolcanoFinder to detect this signal in genomic data. Importantly, VolcanoFinder is a population-genetic likelihood-based approach, rather than a comparative-genomic approach, and can therefore probe genomic variation data from a single population for footprints of adaptive introgression, even from a priori unknown and possibly extinct donor species.
Collapse
|
19
|
Adaptation in structured populations and fuzzy boundaries between hard and soft sweeps. PLoS Comput Biol 2019; 15:e1007426. [PMID: 31710623 PMCID: PMC6872172 DOI: 10.1371/journal.pcbi.1007426] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 11/21/2019] [Accepted: 09/20/2019] [Indexed: 11/19/2022] Open
Abstract
Selective sweeps, the genetic footprint of positive selection, have been extensively studied in the past decades, with dozens of methods developed to identify swept regions. However, these methods suffer from both false positive and false negative reports, and the candidates identified with different methods are often inconsistent with each other. We propose that a biological cause of this problem can be population subdivision, and a technical cause can be incomplete, or inaccurate, modeling of the dynamic process associated with sweeps. Here we used simulations to show how these effects interact and potentially cause bias. In particular, we show that sweeps maybe misclassified as either hard or soft, when the true time stage of a sweep and that implied, or pre-supposed, by the model do not match. We call this "temporal misclassification". Similarly, "spatial misclassification (softening)" can occur when hard sweeps, which are imported by migration into a new subpopulation, are falsely identified as soft. This can easily happen in case of local adaptation, i.e. when the sweeping allele is not under positive selection in the new subpopulation, and the underlying model assumes panmixis instead of substructure. The claim that most sweeps in the evolutionary history of humans were soft, may have to be reconsidered in the light of these findings.
Collapse
|
20
|
Stern AJ, Wilton PR, Nielsen R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet 2019; 15:e1008384. [PMID: 31518343 PMCID: PMC6760815 DOI: 10.1371/journal.pgen.1008384] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 09/25/2019] [Accepted: 08/26/2019] [Indexed: 12/24/2022] Open
Abstract
Most current methods for detecting natural selection from DNA sequence data are limited in that they are either based on summary statistics or a composite likelihood, and as a consequence, do not make full use of the information available in DNA sequence data. We here present a new importance sampling approach for approximating the full likelihood function for the selection coefficient. Our method CLUES treats the ancestral recombination graph (ARG) as a latent variable that is integrated out using previously published Markov Chain Monte Carlo (MCMC) methods. The method can be used for detecting selection, estimating selection coefficients, testing models of changes in the strength of selection, estimating the time of the start of a selective sweep, and for inferring the allele frequency trajectory of a selected or neutral allele. We perform extensive simulations to evaluate the method and show that it uniformly improves power to detect selection compared to current popular methods such as nSL and SDS, and can provide reliable inferences of allele frequency trajectories under many conditions. We also explore the potential of our method to detect extremely recent changes in the strength of selection. We use the method to infer the past allele frequency trajectory for a lactase persistence SNP (MCM6) in Europeans. We also infer the trajectory of a SNP (EDAR) in Han Chinese, finding evidence that this allele's age is much older than previously claimed. We also study a set of 11 pigmentation-associated variants. Several genes show evidence of strong selection particularly within the last 5,000 years, including ASIP, KITLG, and TYR. However, selection on OCA2/HERC2 seems to be much older and, in contrast to previous claims, we find no evidence of selection on TYRP1.
Collapse
Affiliation(s)
- Aaron J. Stern
- Graduate Group in Computation Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Peter R. Wilton
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
| |
Collapse
|
21
|
Flagel L, Brandvain Y, Schrider DR. The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference. Mol Biol Evol 2019; 36:220-238. [PMID: 30517664 PMCID: PMC6367976 DOI: 10.1093/molbev/msy224] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.
Collapse
Affiliation(s)
- Lex Flagel
- Monsanto Company, Chesterfield, MO
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Yaniv Brandvain
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
22
|
Detection and Classification of Hard and Soft Sweeps from Unphased Genotypes by Multilocus Genotype Identity. Genetics 2018; 210:1429-1452. [PMID: 30315068 DOI: 10.1534/genetics.118.301502] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 10/08/2018] [Indexed: 11/18/2022] Open
Abstract
Positive natural selection can lead to a decrease in genomic diversity at the selected site and at linked sites, producing a characteristic signature of elevated expected haplotype homozygosity. These selective sweeps can be hard or soft. In the case of a hard selective sweep, a single adaptive haplotype rises to high population frequency, whereas multiple adaptive haplotypes sweep through the population simultaneously in a soft sweep, producing distinct patterns of genetic variation in the vicinity of the selected site. Measures of expected haplotype homozygosity have previously been used to detect sweeps in multiple study systems. However, these methods are formulated for phased haplotype data, typically unavailable for nonmodel organisms, and some may have reduced power to detect soft sweeps due to their increased genetic diversity relative to hard sweeps. To address these limitations, we applied the H12 and H2/H1 statistics proposed in 2015 by Garud et al., which have power to detect both hard and soft sweeps, to unphased multilocus genotypes, denoting them as G12 and G2/G1. G12 (and the more direct expected homozygosity analog to H12, denoted G123) has comparable power to H12 for detecting both hard and soft sweeps. G2/G1 can be used to classify hard and soft sweeps analogously to H2/H1, conditional on a genomic region having high G12 or G123 values. The reason for this power is that, under random mating, the most frequent haplotypes will yield the most frequent multilocus genotypes. Simulations based on parameters compatible with our recent understanding of human demographic history suggest that expected homozygosity methods are best suited for detecting recent sweeps, and increase in power under recent population expansions. Finally, we find candidates for selective sweeps within the 1000 Genomes CEU, YRI, GIH, and CHB populations, which corroborate and complement existing studies.
Collapse
|
23
|
Vy HMT, Won YJ, Kim Y. Multiple Modes of Positive Selection Shaping the Patterns of Incomplete Selective Sweeps over African Populations of Drosophila melanogaster. Mol Biol Evol 2018; 34:2792-2807. [PMID: 28981697 DOI: 10.1093/molbev/msx207] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
It remains a challenge in evolutionary genetics to elucidate how beneficial mutations arise and propagate in a population and how selective pressures on mutant alleles are structured over space and time. By identifying "sweeping haplotypes (SHs)" that putatively carry beneficial alleles and are increasing (or have increased) rapidly in frequency, and surveying the geographic distribution of SH frequencies, we can indirectly infer how selective sweeps unfold in time and thus which modes of positive selection underlie those sweeps. Using population genomic data from African Drosophila melanogaster, we identified SHs from 37 candidate loci under selection. At more than half of loci, we identify single SHs. However, many other loci harbor multiple independent SHs, namely soft selective sweeps, either due to parallel evolution across space or a high beneficial mutation rate. At about a quarter of the loci, intermediate SH frequencies are found across multiple populations, which cannot be explained unless a certain form of frequency-dependent positive selection, such as heterozygote advantage, is invoked given the reasonable range of migration rates between African populations. At one locus, many independent SHs are observed over multiple populations but always together with ancestral haplotypes. This complex pattern is compatible with a large number of mutational targets in a gene and frequency-dependent selection on new variants. We conclude that very diverse modes of positive selection are operating at different sets of loci in D. melanogaster populations.
Collapse
Affiliation(s)
- Ha My T Vy
- Division of EcoScience, Ewha Womans University, Seoul, Korea
| | - Yong-Jin Won
- Division of EcoScience, Ewha Womans University, Seoul, Korea.,Department of Life Science, Ewha Womans University, Seoul, Korea
| | - Yuseob Kim
- Division of EcoScience, Ewha Womans University, Seoul, Korea.,Department of Life Science, Ewha Womans University, Seoul, Korea
| |
Collapse
|
24
|
Cheng X, Xu C, DeGiorgio M. Fast and robust detection of ancestral selective sweeps. Mol Ecol 2017; 26:6871-6891. [DOI: 10.1111/mec.14416] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Revised: 10/16/2017] [Accepted: 10/23/2017] [Indexed: 01/01/2023]
Affiliation(s)
- Xiaoheng Cheng
- Huck Institutes of Life Sciences; Pennsylvania State University; University Park PA USA
- Department of Biology; Pennsylvania State University; University Park PA USA
| | - Cheng Xu
- Huck Institutes of Life Sciences; Pennsylvania State University; University Park PA USA
| | - Michael DeGiorgio
- Department of Biology; Pennsylvania State University; University Park PA USA
- Department of Statistics; Pennsylvania State University; University Park PA USA
- Institute for CyberScience; Pennsylvania State University; University Park PA USA
| |
Collapse
|
25
|
Villanueva‐Cañas JL, Rech GE, Cara MAR, González J. Beyond
SNP
s: how to detect selection on transposable element insertions. Methods Ecol Evol 2017. [DOI: 10.1111/2041-210x.12781] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
| | - Gabriel E. Rech
- Institute of Evolutionary Biology (CSIC‐Universitat Pompeu Fabra) Barcelona Spain
| | - Maria Angeles Rodriguez Cara
- Ecoanthropology and Ethnobiology Laboratory, UMR 7206, CNRS/MNHN/Universite Paris 7 Museum National d'HistoireNaturelle F‐75116 Paris France
| | - Josefa González
- Institute of Evolutionary Biology (CSIC‐Universitat Pompeu Fabra) Barcelona Spain
| |
Collapse
|
26
|
Schrider DR, Shanku AG, Kern AD. Effects of Linked Selective Sweeps on Demographic Inference and Model Selection. Genetics 2016; 204:1207-1223. [PMID: 27605051 PMCID: PMC5105852 DOI: 10.1534/genetics.116.190223] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 09/02/2016] [Indexed: 01/06/2023] Open
Abstract
The availability of large-scale population genomic sequence data has resulted in an explosion in efforts to infer the demographic histories of natural populations across a broad range of organisms. As demographic events alter coalescent genealogies, they leave detectable signatures in patterns of genetic variation within and between populations. Accordingly, a variety of approaches have been designed to leverage population genetic data to uncover the footprints of demographic change in the genome. The vast majority of these methods make the simplifying assumption that the measures of genetic variation used as their input are unaffected by natural selection. However, natural selection can dramatically skew patterns of variation not only at selected sites, but at linked, neutral loci as well. Here we assess the impact of recent positive selection on demographic inference by characterizing the performance of three popular methods through extensive simulation of data sets with varying numbers of linked selective sweeps. In particular, we examined three different demographic models relevant to a number of species, finding that positive selection can bias parameter estimates of each of these models-often severely. We find that selection can lead to incorrect inferences of population size changes when none have occurred. Moreover, we show that linked selection can lead to incorrect demographic model selection, when multiple demographic scenarios are compared. We argue that natural populations may experience the amount of recent positive selection required to skew inferences. These results suggest that demographic studies conducted in many species to date may have exaggerated the extent and frequency of population size changes.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey 08554
| | - Alexander G Shanku
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Institute for Quantitative Biomedicine, Rutgers University, Piscataway, New Jersey 08554
| | - Andrew D Kern
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey 08554
| |
Collapse
|