1
|
Lukaszewicz M, Salia OI, Hohenlohe PA, Buzbas EO. Approximate Bayesian computational methods to estimate the strength of divergent selection in population genomics models. JOURNAL OF COMPUTATIONAL MATHEMATICS AND DATA SCIENCE 2024; 10:100091. [PMID: 38616846 PMCID: PMC11014422 DOI: 10.1016/j.jcmds.2024.100091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Statistical estimation of parameters in large models of evolutionary processes is often too computationally inefficient to pursue using exact model likelihoods, even with single-nucleotide polymorphism (SNP) data, which offers a way to reduce the size of genetic data while retaining relevant information. Approximate Bayesian Computation (ABC) to perform statistical inference about parameters of large models takes the advantage of simulations to bypass direct evaluation of model likelihoods. We develop a mechanistic model to simulate forward-in-time divergent selection with variable migration rates, modes of reproduction (sexual, asexual), length and number of migration-selection cycles. We investigate the computational feasibility of ABC to perform statistical inference and study the quality of estimates on the position of loci under selection and the strength of selection. To expand the parameter space of positions under selection, we enhance the model by implementing an outlier scan on summarized observed data. We evaluate the usefulness of summary statistics well-known to capture the strength of selection, and assess their informativeness under divergent selection. We also evaluate the effect of genetic drift with respect to an idealized deterministic model with single-locus selection. We discuss the role of the recombination rate as a confounding factor in estimating the strength of divergent selection, and emphasize its importance in break down of linkage disequilibrium (LD). We answer the question for which part of the parameter space of the model we recover strong signal for estimating the selection, and determine whether population differentiation-based summary statistics or LD-based summary statistics perform well in estimating selection.
Collapse
Affiliation(s)
- Martyna Lukaszewicz
- Institute for Interdisciplinary Data Sciences (IIDS), University of Idaho, Moscow, ID, United States of America
- Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, United States of America
- Department of Biological Sciences, University of Idaho, Moscow, ID, United States of America
| | - Ousseini Issaka Salia
- Institute for Interdisciplinary Data Sciences (IIDS), University of Idaho, Moscow, ID, United States of America
- Institute for Modeling Collaboration and Innovation (IMCI), University of Idaho, Moscow, ID, United States of America
- Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, United States of America
- Department of Biological Sciences, University of Idaho, Moscow, ID, United States of America
- Department of Horticulture, Washington State University, Pullman, WA, United States of America
| | - Paul A. Hohenlohe
- Institute for Interdisciplinary Data Sciences (IIDS), University of Idaho, Moscow, ID, United States of America
- Institute for Modeling Collaboration and Innovation (IMCI), University of Idaho, Moscow, ID, United States of America
- Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, United States of America
- Department of Biological Sciences, University of Idaho, Moscow, ID, United States of America
| | - Erkan O. Buzbas
- Institute for Interdisciplinary Data Sciences (IIDS), University of Idaho, Moscow, ID, United States of America
- Institute for Modeling Collaboration and Innovation (IMCI), University of Idaho, Moscow, ID, United States of America
- Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, United States of America
| |
Collapse
|
2
|
Korfmann K, Abu Awad D, Tellier A. Weak seed banks influence the signature and detectability of selective sweeps. J Evol Biol 2023; 36:1282-1294. [PMID: 37551039 DOI: 10.1111/jeb.14204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 06/20/2023] [Accepted: 06/27/2023] [Indexed: 08/09/2023]
Abstract
Seed banking (or dormancy) is a widespread bet-hedging strategy, generating a form of population overlap, which decreases the magnitude of genetic drift. The methodological complexity of integrating this trait implies it is ignored when developing tools to detect selective sweeps. But, as dormancy lengthens the ancestral recombination graph (ARG), increasing times to fixation, it can change the genomic signatures of selection. To detect genes under positive selection in seed banking species it is important to (1) determine whether the efficacy of selection is affected, and (2) predict the patterns of nucleotide diversity at and around positively selected alleles. We present the first tree sequence-based simulation program integrating a weak seed bank to examine the dynamics and genomic footprints of beneficial alleles in a finite population. We find that seed banking does not affect the probability of fixation and confirm expectations of increased times to fixation. We also confirm earlier findings that, for strong selection, the times to fixation are not scaled by the inbreeding effective population size in the presence of seed banks, but are shorter than would be expected. As seed banking increases the effective recombination rate, footprints of sweeps appear narrower around the selected sites and due to the scaling of the ARG are detectable for longer periods of time. The developed simulation tool can be used to predict the footprints of selection and draw statistical inference of past evolutionary events in plants, invertebrates, or fungi with seed banks.
Collapse
Affiliation(s)
- Kevin Korfmann
- Department of Life Science Systems, School of Life Sciences, Technical University of Munich, München, Germany
| | - Diala Abu Awad
- Department of Life Science Systems, School of Life Sciences, Technical University of Munich, München, Germany
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette, France
| | - Aurélien Tellier
- Department of Life Science Systems, School of Life Sciences, Technical University of Munich, München, Germany
| |
Collapse
|
3
|
Kumar H, Panigrahi M, Panwar A, Rajawat D, Nayak SS, Saravanan KA, Kaisa K, Parida S, Bhushan B, Dutt T. Machine-Learning Prospects for Detecting Selection Signatures Using Population Genomics Data. J Comput Biol 2022; 29:943-960. [PMID: 35639362 DOI: 10.1089/cmb.2021.0447] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Natural selection has been given a lot of attention because it relates to the adaptation of populations to their environments, both biotic and abiotic. An allele is selected when it is favored by natural selection. Consequently, the favored allele increases in frequency in the population and neighboring linked variation diminishes, causing so-called selective sweeps. A high-throughput genomic sequence allows one to disentangle the evolutionary forces at play in populations. With the development of high-throughput genome sequencing technologies, it has become easier to detect these selective sweeps/selection signatures. Various methods can be used to detect selective sweeps, from simple implementations using summary statistics to complex statistical approaches. One of the important problems of these statistical models is the potential to provide inaccurate results when their assumptions are violated. The use of machine learning (ML) in population genetics has been introduced as an alternative method of detecting selection by treating the problem of detecting selection signatures as a classification problem. Since the availability of population genomics data is increasing, researchers may incorporate ML into these statistical models to infer signatures of selection with higher predictive accuracy and better resolution. This article describes how ML can be used to aid in detecting and studying natural selection patterns using population genomic data.
Collapse
Affiliation(s)
- Harshit Kumar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Manjit Panigrahi
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Anuradha Panwar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Divya Rajawat
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Sonali Sonejita Nayak
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - K A Saravanan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Kaiho Kaisa
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Subhashree Parida
- Divisions of Pharmacology and Toxicology, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Bharat Bhushan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Triveni Dutt
- Livestock Production and Management Section, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| |
Collapse
|
4
|
Semagn K, Iqbal M, Alachiotis N, N'Diaye A, Pozniak C, Spaner D. Genetic diversity and selective sweeps in historical and modern Canadian spring wheat cultivars using the 90K SNP array. Sci Rep 2021; 11:23773. [PMID: 34893626 PMCID: PMC8664822 DOI: 10.1038/s41598-021-02666-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/22/2021] [Indexed: 12/14/2022] Open
Abstract
Previous molecular characterization studies conducted in Canadian wheat cultivars shed some light on the impact of plant breeding on genetic diversity, but the number of varieties and markers used was small. Here, we used 28,798 markers of the wheat 90K single nucleotide polymorphisms to (a) assess the extent of genetic diversity, relationship, population structure, and divergence among 174 historical and modern Canadian spring wheat varieties registered from 1905 to 2018 and 22 unregistered lines (hereinafter referred to as cultivars), and (b) identify genomic regions that had undergone selection. About 91% of the pairs of cultivars differed by 20-40% of the scored alleles, but only 7% of the pairs had kinship coefficients of < 0.250, suggesting the presence of a high proportion of redundancy in allelic composition. Although the 196 cultivars represented eight wheat classes, our results from phylogenetic, principal component, and the model-based population structure analyses revealed three groups, with no clear structure among most wheat classes, breeding programs, and breeding periods. FST statistics computed among different categorical variables showed little genetic differentiation (< 0.05) among breeding periods and breeding programs, but a diverse level of genetic differentiation among wheat classes and predicted groups. Diversity indices were the highest and lowest among cultivars registered from 1970 to 1980 and from 2011 to 2018, respectively. Using two outlier detection methods, we identified from 524 to 2314 SNPs and 41 selective sweeps of which some are close to genes with known phenotype, including plant height, photoperiodism, vernalization, gluten strength, and disease resistance.
Collapse
Affiliation(s)
- Kassa Semagn
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada.
| | - Muhammad Iqbal
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada
| | - Nikolaos Alachiotis
- Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 3230, Enschede, OV, The Netherlands
| | - Amidou N'Diaye
- Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada
| | - Curtis Pozniak
- Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada
| | - Dean Spaner
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada.
| |
Collapse
|
5
|
Horscroft C, Ennis S, Pengelly RJ, Sluckin TJ, Collins A. Sequencing era methods for identifying signatures of selection in the genome. Brief Bioinform 2020; 20:1997-2008. [PMID: 30053138 DOI: 10.1093/bib/bby064] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 05/16/2018] [Indexed: 12/12/2022] Open
Abstract
Insights into genetic loci which are under selection and their functional roles contribute to increased understanding of the patterns of phenotypic variation we observe today. The availability of whole-genome sequence data, for humans and other species, provides opportunities to investigate adaptation and evolution at unprecedented resolution. Many analytical methods have been developed to interrogate these large data sets and characterize signatures of selection in the genome. We review here recently developed methods and consider the impact of increased computing power and data availability on the detection of selection signatures. Consideration of demography, recombination and other confounding factors is important, and use of a range of methods in combination is a powerful route to resolving different forms of selection in genome sequence data. Overall, a substantial improvement in methods for application to whole-genome sequencing is evident, although further work is required to develop robust and computationally efficient approaches which may increase reproducibility across studies.
Collapse
Affiliation(s)
- Clare Horscroft
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Sarah Ennis
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Reuben J Pengelly
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Timothy J Sluckin
- Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK.,Mathematical Sciences, University of Southampton, Highfield, Southampton, UK
| | - Andrew Collins
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| |
Collapse
|
6
|
Koropoulis A, Alachiotis N, Pavlidis P. Detecting Positive Selection in Populations Using Genetic Data. Methods Mol Biol 2020; 2090:87-123. [PMID: 31975165 DOI: 10.1007/978-1-0716-0199-0_5] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
High-throughput genomic sequencing allows to disentangle the evolutionary forces acting in populations. Among evolutionary forces, positive selection has received a lot of attention because it is related to the adaptation of populations in their environments, both biotic and abiotic. Positive selection, also known as Darwinian selection, occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and, due to genetic hitchhiking, neighboring linked variation diminishes, creating so-called selective sweeps. Such a process leaves traces in genomes that can be detected in a future time point. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular linkage disequilibrium (LD) patterns in the region. A variety of approaches can be used for detecting selective sweeps, ranging from simple implementations that compute summary statistics to more advanced statistical approaches, e.g., Bayesian approaches, maximum-likelihood-based methods, and machine learning methods. In this chapter, we discuss selective sweep detection methodologies on the basis of their capacity to analyze whole genomes or just subgenomic regions, and on the specific polymorphism patterns they exploit as selective sweep signatures. We also summarize the results of comparisons among five open-source software releases (SweeD, SweepFinder, SweepFinder2, OmegaPlus, and RAiSD) regarding sensitivity, specificity, and execution times. Furthermore, we test and discuss machine learning methods and present a thorough performance analysis. In equilibrium neutral models or mild bottlenecks, most methods are able to detect selective sweeps accurately. Methods and tools that rely on linkage disequilibrium (LD) rather than single SNPs exhibit higher true positive rates than the site frequency spectrum (SFS)-based methods under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to build the distribution of the statistic under the null hypothesis. Both LD and SFS-based approaches suffer from decreased accuracy on localizing the true target of selection in bottleneck scenarios. Furthermore, we present an extensive analysis of the effects of gene flow on selective sweep detection, a problem that has been understudied in selective sweep literature.
Collapse
Affiliation(s)
- Angelos Koropoulis
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece
- Computer Science Department, University of Crete, Crete, Heraklion, Greece
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece.
| |
Collapse
|
7
|
Wegary D, Teklewold A, Prasanna BM, Ertiro BT, Alachiotis N, Negera D, Awas G, Abakemal D, Ogugo V, Gowda M, Semagn K. Molecular diversity and selective sweeps in maize inbred lines adapted to African highlands. Sci Rep 2019; 9:13490. [PMID: 31530852 PMCID: PMC6748982 DOI: 10.1038/s41598-019-49861-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 08/28/2019] [Indexed: 11/08/2022] Open
Abstract
Little is known on maize germplasm adapted to the African highland agro-ecologies. In this study, we analyzed high-density genotyping by sequencing (GBS) data of 298 African highland adapted maize inbred lines to (i) assess the extent of genetic purity, genetic relatedness, and population structure, and (ii) identify genomic regions that have undergone selection (selective sweeps) in response to adaptation to highland environments. Nearly 91% of the pairs of inbred lines differed by 30-36% of the scored alleles, but only 32% of the pairs of the inbred lines had relative kinship coefficient <0.050, which suggests the presence of substantial redundancy in allelic composition that may be due to repeated use of fewer genetic backgrounds (source germplasm) during line development. Results from different genetic relatedness and population structure analyses revealed three different groups, which generally agrees with pedigree information and breeding history, but less so by heterotic groups and endosperm modification. We identified 944 single nucleotide polymorphic (SNP) markers that fell within 22 selective sweeps that harbored 265 protein-coding candidate genes of which some of the candidate genes had known functions. Details of the candidate genes with known functions and differences in nucleotide diversity among groups predicted based on multivariate methods have been discussed.
Collapse
Affiliation(s)
- Dagne Wegary
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia
| | - Adefris Teklewold
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia.
| | - Boddupalli M Prasanna
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya
| | - Berhanu T Ertiro
- Bako National Maize Research Center, Ethiopian Institute of Agricultural Research (EIAR), Addis Ababa, Ethiopia
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece
| | - Demewez Negera
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia
| | - Geremew Awas
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia
| | - Demissew Abakemal
- Ambo Agricultural Research Center, P.O. Box 37, West Shoa, Ambo, Ethiopia
| | - Veronica Ogugo
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya
| | - Manje Gowda
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya
| | - Kassa Semagn
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya.
- Africa Rice Center (AfricaRice), M'bé Research Station, 01 B.P. 2551, Bouaké 01, Côte d'Ivoire.
| |
Collapse
|
8
|
Ndjiondjop MN, Alachiotis N, Pavlidis P, Goungoulou A, Kpeki SB, Zhao D, Semagn K. Comparisons of molecular diversity indices, selective sweeps and population structure of African rice with its wild progenitor and Asian rice. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:1145-1158. [PMID: 30578434 PMCID: PMC6449321 DOI: 10.1007/s00122-018-3268-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 12/11/2018] [Indexed: 05/20/2023]
Abstract
The extent of molecular diversity parameters across three rice species was compared using large germplasm collection genotyped with genomewide SNPs and SNPs that fell within selective sweep regions. Previous studies conducted on limited number of accessions have reported very low genetic variation in African rice (Oryza glaberrima Steud.) as compared to its wild progenitor (O. barthii A. Chev.) and to Asian rice (O. sativa L.). Here, we characterized a large collection of African rice and compared its molecular diversity indices and population structure with the two other species using genomewide single nucleotide polymorphisms (SNPs) and SNPs that mapped within selective sweeps. A total of 3245 samples representing African rice (2358), Asian rice (772) and O. barthii (115) were genotyped with 26,073 physically mapped SNPs. Using all SNPs, the level of marker polymorphism, average genetic distance and nucleotide diversity in African rice accounted for 59.1%, 63.2% and 37.1% of that of O. barthii, respectively. SNP polymorphism and overall nucleotide diversity of the African rice accounted for 20.1-32.1 and 16.3-37.3% of that of the Asian rice, respectively. We identified 780 SNPs that fell within 37 candidate selective sweeps in African rice, which were distributed across all 12 rice chromosomes. Nucleotide diversity of the African rice estimated from the 780 SNPs was 8.3 × 10-4, which is not only 20-fold smaller than the value estimated from all genomewide SNPs (π = 1.6 × 10-2), but also accounted for just 4.1%, 0.9% and 2.1% of that of O. barthii, lowland Asian rice and upland Asian rice, respectively. The genotype data generated for a large collection of rice accessions conserved at the AfricaRice genebank will be highly useful for the global rice community and promote germplasm use.
Collapse
Affiliation(s)
- Marie Noelle Ndjiondjop
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire.
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece
| | - Alphonse Goungoulou
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire
| | - Sèdjro Bienvenu Kpeki
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire
| | - Dule Zhao
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire
| | - Kassa Semagn
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire.
| |
Collapse
|
9
|
Zhou M, Pan Z, Cao X, Guo X, He X, Sun Q, Di R, Hu W, Wang X, Zhang X, Zhang J, Zhang C, Liu Q, Chu M. Single Nucleotide Polymorphisms in the HIRA Gene Affect Litter Size in Small Tail Han Sheep. Animals (Basel) 2018; 8:ani8050071. [PMID: 29734691 PMCID: PMC5981282 DOI: 10.3390/ani8050071] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Revised: 04/28/2018] [Accepted: 04/28/2018] [Indexed: 12/19/2022] Open
Abstract
Simple Summary Litter size is one of the most important reproductive traits in sheep. Two single nucleotide polymorphisms (SNPs), g.71874104G>A and g.71833755T>C, in the Histone Cell Cycle Regulator (HIRA) gene, were identified by whole-genome sequencing (WGS) and may be correlated with litter size in sheep. The two SNPs were genotyped and expression patterns of HIRA was determined in sheep breeds with different fecundity and in groups of Small Tail Han sheep producing large or small litters. Association analysis indicated that both SNPs were significantly correlated with litter size in Small Tail Han sheep. Furthermore, high levels of HIRA expression may have a negative effect on litter size in Small Tail Han sheep. Abstract Maintenance of appropriate levels of fecundity is critical for efficient sheep production. Opportunities to increase sheep litter size include identifying single gene mutations with major effects on ovulation rate and litter size. Whole-genome sequencing (WGS) data of 89 Chinese domestic sheep from nine different geographical locations and ten Australian sheep were analyzed to detect new polymorphisms affecting litter size. Comparative genomic analysis of sheep with contrasting litter size detected a novel set of candidate genes. Two SNPs, g.71874104G>A and g.71833755T>C, were genotyped in 760 Small Tail Han sheep and analyzed for association with litter size. The two SNPs were significantly associated with litter size, being in strong linkage disequilibrium in the region 71.80–71.87 Mb. This haplotype block contains one gene that may affect litter size, Histone Cell Cycle Regulator (HIRA). HIRA mRNA levels in sheep with different lambing ability were significantly higher in ovaries of Small Tail Han sheep (high fecundity) than in Sunite sheep (low fecundity). Moreover, the expression levels of HIRA in eight tissues of uniparous Small Tail Han sheep were significantly higher than in multiparous Small Tail Han sheep (p < 0.05). HIRA SNPs significantly affect litter size in sheep and are useful as genetic markers for litter size.
Collapse
Affiliation(s)
- Mei Zhou
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Zhangyuan Pan
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
- College of Agriculture and Forestry Science, Linyi University, Linyi 276000, China.
| | - Xiaohan Cao
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
- College of Life Science, Sichuan Agricultural University, Ya'an 625014, China.
| | - Xiaofei Guo
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Xiaoyun He
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Qing Sun
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Ran Di
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Wenping Hu
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Xiangyu Wang
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Xiaosheng Zhang
- Tianjin Institute of Animal Sciences, Tianjin 300381, China.
| | - Jinlong Zhang
- Tianjin Institute of Animal Sciences, Tianjin 300381, China.
| | - Chunyuan Zhang
- State Key Laboratory for Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China.
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing 100193, China.
| | - Qiuyue Liu
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Mingxing Chu
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| |
Collapse
|
10
|
Weigand H, Leese F. Detecting signatures of positive selection in non-model species using genomic data. Zool J Linn Soc 2018. [DOI: 10.1093/zoolinnean/zly007] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Affiliation(s)
- Hannah Weigand
- Aquatic Ecosystem Research, University of Duisburg-Essen, Universitätsstraße, Essen, Germany
| | - Florian Leese
- Aquatic Ecosystem Research, University of Duisburg-Essen, Universitätsstraße, Essen, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstraße, Essen, Germany
| |
Collapse
|
11
|
Li X, Yang S, Dong K, Tang Z, Li K, Fan B, Wang Z, Liu B. Identification of positive selection signatures in pigs by comparing linkage disequilibrium variances. Anim Genet 2017; 48:600-605. [DOI: 10.1111/age.12574] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/11/2017] [Indexed: 11/26/2022]
Affiliation(s)
- X. Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education; Key Laboratory of Pig Genetics and Breeding of Ministry of Agriculture; Huazhong Agricultural University; Wuhan Hubei 430070 China
- The Cooperative Innovation Center for Sustainable Pig Production; Wuhan Hubei 430070 China
- Department of Agricultural, Food and Nutritional Science; University of Alberta; Edmonton AB T6G 2P5 Canada
| | - S. Yang
- College of Animal Science and Technology; Zhejiang A&F University; Lin'an Zhejiang 311300 China
| | - K. Dong
- The Key Laboratory for Domestic Animal Genetic Resources and Breeding of Ministry of Agriculture of China; Institute of Animal Science; Chinese Academy of Agricultural Sciences; Beijing 100193 China
| | - Z. Tang
- The Key Laboratory for Domestic Animal Genetic Resources and Breeding of Ministry of Agriculture of China; Institute of Animal Science; Chinese Academy of Agricultural Sciences; Beijing 100193 China
| | - K. Li
- The Key Laboratory for Domestic Animal Genetic Resources and Breeding of Ministry of Agriculture of China; Institute of Animal Science; Chinese Academy of Agricultural Sciences; Beijing 100193 China
| | - B. Fan
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education; Key Laboratory of Pig Genetics and Breeding of Ministry of Agriculture; Huazhong Agricultural University; Wuhan Hubei 430070 China
| | - Z. Wang
- Department of Agricultural, Food and Nutritional Science; University of Alberta; Edmonton AB T6G 2P5 Canada
| | - B. Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education; Key Laboratory of Pig Genetics and Breeding of Ministry of Agriculture; Huazhong Agricultural University; Wuhan Hubei 430070 China
- The Cooperative Innovation Center for Sustainable Pig Production; Wuhan Hubei 430070 China
| |
Collapse
|
12
|
Pavlidis P, Alachiotis N. A survey of methods and tools to detect recent and strong positive selection. ACTA ACUST UNITED AC 2017; 24:7. [PMID: 28405579 PMCID: PMC5385031 DOI: 10.1186/s40709-017-0064-0] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 03/29/2017] [Indexed: 01/25/2023]
Abstract
Positive selection occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and due to genetic hitchhiking the neighboring linked variation diminishes, creating so-called selective sweeps. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular LD patterns in the region. A variety of methods and tools can be used for detecting sweeps, ranging from simple implementations that compute summary statistics such as Tajima's D, to more advanced statistical approaches that use combinations of statistics, maximum likelihood, machine learning etc. In this survey, we present and discuss summary statistics and software tools, and classify them based on the selective sweep signature they detect, i.e., SFS-based vs. LD-based, as well as their capacity to analyze whole genomes or just subgenomic regions. Additionally, we summarize the results of comparisons among four open-source software releases (SweeD, SweepFinder, SweepFinder2, and OmegaPlus) regarding sensitivity, specificity, and execution times. In equilibrium neutral models or mild bottlenecks, both SFS- and LD-based methods are able to detect selective sweeps accurately. Methods and tools that rely on LD exhibit higher true positive rates than SFS-based ones under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to represent the null hypothesis. When the correct (or similar to the correct) demographic model is used instead, the false positive rates are considerably reduced. The accuracy of detecting the true target of selection is decreased in bottleneck scenarios. In terms of execution time, LD-based methods are typically faster than SFS-based methods, due to the nature of required arithmetic.
Collapse
Affiliation(s)
- Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, 70013 Crete, Greece
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, 70013 Crete, Greece
| |
Collapse
|
13
|
Alachiotis N, Pavlidis P. Scalable linkage-disequilibrium-based selective sweep detection: a performance guide. Gigascience 2016; 5:7. [PMID: 26862394 PMCID: PMC4746822 DOI: 10.1186/s13742-016-0114-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 01/20/2016] [Indexed: 01/09/2023] Open
Abstract
Background Linkage disequilibrium is defined as the non-random associations of alleles at different loci, and it occurs when genotypes at the two loci depend on each other. The model of genetic hitchhiking predicts that strong positive selection affects the patterns of linkage disequilibrium around the site of a beneficial allele, resulting in specific motifs of correlation between neutral polymorphisms that surround the fixed beneficial allele. Increased levels of linkage disequilibrium are observed on the same side of a beneficial allele, and diminish between sites on different sides of a beneficial mutation. This specific pattern of linkage disequilibrium occurs more frequently when positive selection has acted on the population rather than under various neutral models. Thus, detecting such patterns could accurately reveal targets of positive selection along a recombining chromosome or a genome. Calculating linkage disequilibria in whole genomes is computationally expensive because allele correlations need to be evaluated for millions of pairs of sites. To analyze large datasets efficiently, algorithmic implementations used in modern population genetics need to exploit multiple cores of current workstations in a scalable way. However, population genomic datasets come in various types and shapes while typically showing SNP density heterogeneity, which makes the implementation of generally scalable parallel algorithms a challenging task. Findings Here we present a series of four parallelization strategies targeting shared-memory systems for the computationally intensive problem of detecting genomic regions that have contributed to the past adaptation of the species, also referred to as regions that have undergone a selective sweep, based on linkage disequilibrium patterns. We provide a thorough performance evaluation of the proposed parallel algorithms for computing linkage disequilibrium, and outline the benefits of each approach. Furthermore, we compare the accuracy of our open-source sweep-detection software OmegaPlus, which implements all four parallelization strategies presented here, with a variety of neutrality tests. Conclusions The computational demands of selective sweep detection algorithms depend greatly on the SNP density heterogeneity and the data representation. Choosing the right parallel algorithm for the analysis can lead to significant processing time reduction and major energy savings. However, determining which parallel algorithm will execute more efficiently on a specific processor architecture and number of available cores for a particular dataset is not straightforward. Electronic supplementary material The online version of this article (doi:10.1186/s13742-016-0114-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nikolaos Alachiotis
- Department of Electrical and Computer Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, 15213 PA USA
| | - Pavlos Pavlidis
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Crete, 70013 Greece
| |
Collapse
|